Contents. Decision Making under Uncertainty 1. Meanings of uncertainty. Classical interpretation

Similar documents
Chapter 5. Bayesian Statistics

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

V. Probability. by David M. Lane and Dan Osherson

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

*Karle Laska s Sections: There is no class tomorrow and Friday! Have a good weekend! Scores will be posted in Compass early Friday morning

18.05 Practice Final Exam

Bayesian Inference. Introduction

Counting principles, including permutations and combinations.

Bayesian Networks Basic and simple graphs

Probability theory basics

Discrete Binary Distributions

Binomial random variable

18.05 Final Exam. Good luck! Name. No calculators. Number of problems 16 concept questions, 16 problems, 21 pages

Bayesian Inference. STA 121: Regression Analysis Artin Armagan

Announcements. Lecture 5: Probability. Dangling threads from last week: Mean vs. median. Dangling threads from last week: Sampling bias

Chapter 8: An Introduction to Probability and Statistics

A primer on Bayesian statistics, with an application to mortality rate estimation

Frequentist Statistics and Hypothesis Testing Spring

Class 26: review for final exam 18.05, Spring 2014

Examples of frequentist probability include games of chance, sample surveys, and randomized experiments. We will focus on frequentist probability sinc

Cogs 14B: Introduction to Statistical Analysis

Probability, Entropy, and Inference / More About Inference

Probability and Statistics. Terms and concepts

Bayesian Updating with Continuous Priors Class 13, Jeremy Orloff and Jonathan Bloom

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION

Advanced Herd Management Probabilities and distributions

Probability and Statistics. Joyeeta Dutta-Moscato June 29, 2015

With Question/Answer Animations. Chapter 7

Conditional Probability & Independence. Conditional Probabilities

Exam 2 Practice Questions, 18.05, Spring 2014

Lecture 1: Probability Fundamentals

Recursive Estimation

Discrete Probability. Chemistry & Physics. Medicine

The Random Variable for Probabilities Chris Piech CS109, Stanford University

Formal Modeling in Cognitive Science

Formal Modeling in Cognitive Science Lecture 19: Application of Bayes Theorem; Discrete Random Variables; Distributions. Background.

An AI-ish view of Probability, Conditional Probability & Bayes Theorem

10/18/2017. An AI-ish view of Probability, Conditional Probability & Bayes Theorem. Making decisions under uncertainty.

Motivation. Stat Camp for the MBA Program. Probability. Experiments and Outcomes. Daniel Solow 5/10/2017

CENTRAL LIMIT THEOREM (CLT)

Bayesian Inference. p(y)

Fourier and Stats / Astro Stats and Measurement : Stats Notes

Lecture 3 Probability Basics

Bayesian Statistics. State University of New York at Buffalo. From the SelectedWorks of Joseph Lucke. Joseph F. Lucke

Basic notions of probability theory

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

Recursive Ambiguity and Machina s Examples

Introduction to AI Learning Bayesian networks. Vibhav Gogate

CS 188: Artificial Intelligence Spring Today

Probability is related to uncertainty and not (only) to the results of repeated experiments

Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell

Uniform Sources of Uncertainty for Subjective Probabilities and

Introduction to Probability 2017/18 Supplementary Problems

Conditional Probability & Independence. Conditional Probabilities

CS 361: Probability & Statistics

Applied Bayesian Statistics STAT 388/488

Probability Review - Bayes Introduction

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

Probability - Lecture 4

STOR 435 Lecture 5. Conditional Probability and Independence - I

An introduction to biostatistics: part 1

MAE 493G, CpE 493M, Mobile Robotics. 6. Basic Probability

Deep Learning for Computer Vision

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

CONTINUOUS RANDOM VARIABLES

Basic notions of probability theory

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing

The Bayesian Paradigm

Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides

CISC 1100/1400 Structures of Comp. Sci./Discrete Structures Chapter 7 Probability. Outline. Terminology and background. Arthur G.

Lecture #13 Tuesday, October 4, 2016 Textbook: Sections 7.3, 7.4, 8.1, 8.2, 8.3

Toss 1. Fig.1. 2 Heads 2 Tails Heads/Tails (H, H) (T, T) (H, T) Fig.2

Probability Theory and Applications

Quantitative Understanding in Biology 1.7 Bayesian Methods

CSC Discrete Math I, Spring Discrete Probability

1 A simple example. A short introduction to Bayesian statistics, part I Math 217 Probability and Statistics Prof. D.

Probability 5-4 The Multiplication Rules and Conditional Probability

Lecture notes for probability. Math 124

Bayesian analysis in nuclear physics

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Uncertain Knowledge and Bayes Rule. George Konidaris

A Event has occurred

PROBABILITY CHAPTER LEARNING OBJECTIVES UNIT OVERVIEW

MAT 271E Probability and Statistics

CS 540: Machine Learning Lecture 2: Review of Probability & Statistics

KDF2C QUANTITATIVE TECHNIQUES FOR BUSINESSDECISION. Unit : I - V

CHAPTER 15 PROBABILITY Introduction

Computational Perception. Bayesian Inference

Probability and Probability Distributions. Dr. Mohammed Alahmed

Machine Learning

STA Module 4 Probability Concepts. Rev.F08 1

18.600: Lecture 4 Axioms of probability and inclusion-exclusion

Conditional Probability. CS231 Dianna Xu

Chapter 8 Sequences, Series, and Probability

Bayesian Analysis for Natural Language Processing Lecture 2

Event A: at least one tail observed A:

Lecture 10 and 11: Text and Discrete Distributions

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Transcription:

Contents Decision Making under Uncertainty 1 elearning resources Prof. Ahti Salo Helsinki University of Technology http://www.dm.hut.fi Meanings of uncertainty Interpretations of probability Biases in probability elicitation Calibration of experts Discrete Continuous 2/39 Meanings of uncertainty We frequently make statements about uncertainty: It will rain tomorrow. subjective probability The 100 000 th decimal of π is 6. a fact, the uncertainty lies in the available information I win in a lottery with probability frequentist or classical 0,00005. probability interpretation Uncertainty = An event with unknown outcome Probability = A number for measuring uncertainty. 3/39 Interpretations of probability Classical interpretation P.S. Laplace (1825) Probability = The ratio between the number of possible outcomes favourable to the event to the total number of possible outcomes, each assumed to be equally likely. #( A) PA ( ) = #( S) #(A) = Number of possible outcomes favourable to A #(S) = Total number of possible outcomes Circular definition: probability defined in terms of equally likely. Principle of indifference: Events are equally likely if there is no known reason for predicting the occurrence of one event rather than another. Each event is defined as a collection of outcomes. The probability to get 6 when tossing a dice is 1/6. 4/39

Interpretations of probability Frequentist interpretation Interpretations of probability Subjective (Bayesian) interpretation Leslie Ellis, mid 19 th century Probability = The relative frequency of trials in which the favourable event occurs as the number of trials approaches infinity. n( A) n(a) = Number of times that A occurs PA ( ) = lim n n n = Total number of trials You may determine the probability of getting heads by tossing a coin a very large number of times. De Finetti (1937) Probability = Represents an individual s degree of belief in the occurrence of a particular outcome. The probability may change e.g. when additional information is received. The event may have already occurred. I believe it s a 50 % chance that it will rain tomorrow. I m 15 % sure that Martin Luther King was 34 years old when he died. 5/39 6/39 Prerequisite The events must be well defined. Results / Goal A probability number to any outcome of interest. One can use 1) Past evidence 2) Causal models 3) Expert judgement Elicitation of discrete subjective probabilities 1) Direct assessment Ask the respondent to assign numerical values to events. 2) Betting approach Find a specific amount to win or lose such that the DM is indifferent about which side of the bet to take. 3) Reference lottery Compare the uncertain event to events with known probabilities. 7/39 8/39

Direct assessment Betting approach The respondent is asked to assess the desired probabilities directly. E.g. a wheel of fortune (probability wheel) can be used to support the elicitation process. Bet 1 Bet 2 Win X if A happens. Lose Y if A does not happen. Lose X if A happens. Win Y if A does not happen. Bet for A Bet against A A not A A not A X -Y -X Y Event A Event B Event C The mode of questioning may affect the results. 9/39 Adjust X and Y until the respondent is indifferent about which bet to take. Now the expected monetary values of the bets must be equal*: X P(A) - Y (1 - P(A)) = - X P(A) + Y (1 - P(A)) Y PA ( ) = X + Y * The respondent is assumed to be concerned with expected monetary value only (i.e. risk neutral). 10/39 Reference lottery The Ellsberg paradox (1961) (1/2) Lottery Win X if A happens. (X > Y) Win Y if A does not happen. Ref. Win X with prob. p. lottery Win Y with prob. (1-p). The reference lottery can be visualised e.g. with a wheel of fortune or a box with e.g. red and blue balls from which you want to draw a red one. The probability is changed by adjusting the wheel or the number of balls. Once the respondent is indifferent about which lottery to participate in, we have: PA ( ) = p Lottery Ref. lottery The respondent s risk attitude does not affect the result. 11/39 A not A p (1-p) X Y X Y Which game would you choose, 1 or 2? Game 1 Win 1000 if you pick a red ball Game 2 Win 1000 if you pick a blue ball What about 3 or 4? Game 3 Win 1000 if you pick a red or yellow ball Game 4 Win 1000 if you pick a blue or yellow ball Balls in the urn: 12/39 30 60 red blue yellow

The Ellsberg paradox (1961) (2/2) Assessment of continuous subjective probabilities (1/3) Most people choose games 1 and 4. But yellow should not matter! Wins: Red Blue Yellow Game 1 1000 0 0 Game 2 0 1000 0 Game 3 1000 0 1000 Game 4 0 1000 1000 Ambiguity aversion: People tend to prefer events with known probabilities! 13/39 Fractile method The expert is asked to give the feasible range (min, max) median, f 50% (i.e. P(X<f 50% ) = 0.5) other fractiles (e.g. 5%, 25%, 75%, 95%). Typical elicitation questions: min: What is the least possible value you could imagine that the variable X could possible obtain? Median (f 50% ): Give a number f 50% such that, in your opinion, X < f 50% is just as probable as X>f 50%. (i.e. P(X<f 50% ) = P(X>f 50% ) ) 5% fractile (f 5% ): Give a number f 5% such that, in your opinion, X < f 5% is just as probable as picking a red ball from an urn with 1 red and 19 blue balls. (i.e. P(X<f 5% ) = 0.05) 14/39 Assessment of continuous subjective probabilities (2/3) A cumulative distribution function is obtained e.g. by interpolation or fitting a curve from a specific class of functions. Cumulative probability 0,8 0,6 0,4 0,2 1 0 Fractiles and fitted distribution Assessment of continuous subjective probabilities (3/3) Histogram method The possible values of the variable are divided into intervals. The probability of the event corresponding to each interval is assessed. A histogram is gained (=estimate for the density function). Anchoring bias may occur (See bias section.). Variable value 15/39 16/39

Example: Histogram method Scoring rules How much will the GDP (volume) grow in Finland in year 2004? (Average growth 1997-2001 was 4.5 %.) Growth % Probability -5-0 0.10 0-1 0.15 1-2 0.30 2-3 0.20 3-4 0.12 4-5 0.08 5-10 0.05 Elicited 17/39 With scoring rules e.g. the compensation of the respondent can be thread to the accuracy of his reply in such a way that he benefits from telling his true estimate. A scoring rule S(r) is strictly proper if the true opinion of the respondent maximises his expected score: E[S(p)] > E[S(r )], for all r p p = (p 1, p 2,, p n ) = true opinion r = (r 1, r 2,, p n ) = responded distribution The three mots commonly encountered proper scoring rules are: n 2 j( 1, 2,..., n) = 2 j i i= 1 S r r r r r Sj( r1, r2,..., rn) = log( rj) r S ( r, r,..., r ) =,where S j is the score if event j occurs. j j 1 2 n n 12 2 ( r i= 1 i ) 18/39 Example: Scoring rules Biases in probability elicitation A weather forecaster believes that it is a p = 0.7 probability of rain the next day. Which probability r should he respond with, when he gets S 1 (r) = 2*r - r 2 - (1-r) 2 if it does rain and S 2 (r) = 2*(1-r) - r 2 - (1-r) 2 if it does not? His expected compensation is: E[S(r )] = 0.7*S 1 (r ) + 0.3*S 2 (r ), which will be maximised if he responds 0.7, which is his true estimate. Expected compensation 0,8 0,6 0,4 0,2 0 0 0,2 0,4 0,6 0,8 1-0,2-0,4-0,6 r People use rules of thumb, i.e. heuristics in assessing probabilities Often useful May lead to systematic incorrect estimates, biases Gamblers believe their luck will change. Independence: You toss HHHHHHH (H=head), what is next? Is P(HHHHTTTT) < P(HTTHHTHT)? 19/39 20/39

Biases in probability elicitation Representativeness If x fits the description of a set A well, then P(x A) is assumed to be large. The relative proportions (prior prob.) are not taken into consideration. Example: You see a very good looking young woman in a bar. Is she more probably a professional model or a nurse? Many people are tempted to answer a model. Yet, professional models are rare, while there are far more nurses: thus, she is actually more likely to be a nurse. Biases in probability elicitation Availability People assess the probability of an event by the ease with which instances or occurrences can be brought to mind. Example: In a typical sample of text in English, is it more likely that a word starts with the letter K or that K is the third letter? Generally about 70 % think that words starting with K are more common. In truth, however, there are approximately twice as many words with K in the third position as there are words that begin with it. 21/39 22/39 Biases in probability elicitation Anchoring and adjustment Biases in probability elicitation Conservatism In several situations people assess probabilities by adjusting a given number. Often the adjustment isn t big enough and the final assessment is too close to the starting value. Example: Is the percentage of African countries in the UN a) greater or less than 65? What is the exact percentage? Average answer: Less, 45% b) greater or less than 10? What is the exact percentage? Average answer: Greater, 25% How to reduce: Avoid giving starting values People tend to change previous probability estimates more slowly than warranted by new data (according to the Bayes theorem). Example: Suppose we have two bags. One bag contains 30 white balls and 10 black balls. The other bag contains 30 black balls and 10 white. Suppose we choose one of these bags at random. For this bag we select five balls at random, replacing each ball after it has been selected. The result is that we find 4 white balls and one black. What is the probability that we were using the bag with mainly white balls? If you are a typical subject, your would have answered between 0.7 and 0.8. The answer is in fact, 0.96, which can be calculated using the Bayes theorem. 23/39 24/39

Biases in probability elicitation Hindsight bias People falsely believe they would have predicted the outcome of an event. Once outcomes observed, the DM may assume that they are the only ones that could have happened and underestimate the uncertainty. Prevents ourselves from learning from the past. Warning people of this bias has little effect. How to reduce: Argue against the inevitability of the reported outcome and convince that it might have turned out otherwise. Biases in probability elicitation Overconfidence People tend to be overconfident in their assessments According to research: claimed confidence 100 % 90 % 80 % relative frequency of correct answers 80 % 75 % 65 % How to reduce: Give feedback according to the quality of earlier assessments. 25/39 26/39 Calibration of experts Calibration of experts Calibration of experts Example: Calibration of experts (1/2) Goal: To know that if the DM says the probability is p, the real probability is f(p). Define a calibration curve p f(p) Calibration can be done e.g. by eliciting known probabilities. An expert assesses a cumulative subjective probability distribution function F X (x) for a variable X prior to its realisation When the outcome x* becomes known, define ζ as ζ = F X (x*) Now ζ should follow the uniform distribution ζ ~ U(0,1) 27/39 On 10 recent occasions, a technical expert assessed normal probability distributions on the performance index x i of a series of research experiments: The values ζ i are calculated from the cumulative predictive distributions: ζ i = F N (x i µ i, σ i ) In a situation of perfect calibration the ζ i :s should be 0.1, 0.2, 0.9, 1.0. 28/39

Calibration of experts Example: Calibration of experts (2/2) Now the calibration function to the left can be drawn. If the expert assessed the distribution shown on right for an eleventh experiment, it could be corrected using the calibration function. Assessed probability distributions can be improved i.e. updated when new information is gained. 29/39 30/39 Updating of discrete probabilities 1. We have a probability estimate for event H: prior probability P(H) 2. New information D is gained The Bayes theorem The updating is done using the Bayes theorem: P( D H) P( H) PH ( D) = PD ( ) 3. Update the estimate using Bayes theorem: posterior probability P(H D) 31/39 32/39

Example: Using Bayes theorem Updating of continuous distributions (1/3) 1,5 % of the population suffer from schizophrenia P(S) = 0.015 (prior probability) Brain atrophy is found in 30 % of the schizophrenic P(A S) = 0.3 2 % of normal people P(A S) = 0.02 If a person has brain atrophy, the probability that he is schizophrenic (posterior probability) is: PASPS ( ) ( ) PS ( A) = PASPS ( ) ( ) + PASPS ( ) ( ) 0.3 0.015 = = 0.186 0.3 0.015 + 0.02 0.985 Picture: Clemen s. 250 Figure: Posterior probability with different prior probabilities. 33/39 Choose a theoretical distribution, P(X=x θ), for the physical process of interest. Assess uncertainty about parameter θ: prior distribution, f(θ) Observe data x 1 Update using Bayes theorem: posterior distribution of θ, f(θ x 1 ) 34/39 Note: Uncertainty about X has two parts: 1. Due to the process itself, P(X=x θ). 2. Uncertainty about θ, f(θ), later updated to f(θ x 1 ). Updating of continuous distributions (2/3) Updating of continuous distributions (3/3) Bayes theorem for continuous θ : f( x1 θ) f( θ) f( θ x1 ) = f ( x θ) f( θ) dθ 1 f(x 1 θ) is called the likelihood function of θ with a given observed data x 1. In most cases the posterior distribution can not be calculated analytically, but must be solved numerically. When we have natural conjugate distributions the posterior distribution can be solved analytically. If the distribution for the physical process of interest and the prior distribution for its parameter are suitably chosen, then the prior and posterior distributions of the parameter have the same form: physical process: θ prior: θ posterior: X ~ Exp(λ) λ ~ Gamma(α, β) λ ~ Gamma(α*, β*) X ~ Bin(n, p) p ~ Beta(r 0, n 0 ) p ~ Beta(r*, n*) X ~ N(µ, σ 2 ) µ ~ N(m 0, σ 2 0 ) µ ~ N(m*, σ2 *) 35/39 36/39

Binomial distribution About the beta distribution 1. The physical process of interest is binomial distributed: X ~ Bin(n, p) 2. Prior distribution for p: p ~ Beta(r 0, n 0 ) Γ( r0) Γ( n0 r0) r0 1 n0 r0 f ( ) (1 ) 1 β p = p p,0< p< 1, r0, n0 > 0 Γ( n ) 0 3. Observe a sample of the physical process: sample size: n 1 favourable cases: r 1 4. The posterior distribution, calculated using the Bayes theorem, gets reduced to: p ~ Beta(r 0 +r 1, n 0 +n 1 ) The beta distribution is very flexible. Some examples: 37/39 38/39 Normal distribution 1. The physical process of interest is normal distributed: X ~ N(µ, σ 2 ) (σ is assumed to be known) 2. Prior distribution for µ: µ ~ N(m 0, σ 2 0 ) (notation: σ2 0 = σ2 / n 0 ) 3. Observe a sample of the physical process: sample size: n 1 sample mean: x 1 4. The posterior distribution, calculated using the Bayes theorem, gets reduced to: 2 2 m µ ~ N(m*, σ 2 0σ n1+ x1σ0 n0m0 + n1x 1 *), where m* = = 2 2 σ n1+ σ0 n0 + n1 2 2 2 2 σσ 0 n1 σ σ * = = 2 2 σ n + σ n + n 1 0 0 1 39/39