Pb ( a ) = measure of the plausibility of proposition b conditional on the information stated in proposition a. & then using P2

Similar documents
PROBABILITY LOGIC: Part 2

Math 155 (Lecture 3)

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Axioms of Measure Theory

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.

Infinite Sequences and Series

Random Models. Tusheng Zhang. February 14, 2013

Introduction to Probability. Ariel Yadin. Lecture 2

As stated by Laplace, Probability is common sense reduced to calculation.

Chapter 0. Review of set theory. 0.1 Sets

Sequences. Notation. Convergence of a Sequence

An Introduction to Randomized Algorithms

What is Probability?

4 Mathematical Induction

Problem Set 4 Due Oct, 12

6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer.

Lecture 12: November 13, 2018

Homework 9. (n + 1)! = 1 1

CS 330 Discussion - Probability

6.3 Testing Series With Positive Terms

Introduction to Probability. Ariel Yadin. Lecture 7

Lecture 2: April 3, 2013

Sequences, Mathematical Induction, and Recursion. CSE 2353 Discrete Computational Structures Spring 2018

Mathematical Induction

Lecture Overview. 2 Permutations and Combinations. n(n 1) (n (k 1)) = n(n 1) (n k + 1) =

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

Topic 5: Basics of Probability

ECON 3150/4150, Spring term Lecture 3

7.1 Convergence of sequences of random variables

Convergence of random variables. (telegram style notes) P.J.C. Spreij

UNIT 2 DIFFERENT APPROACHES TO PROBABILITY THEORY

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 18

Different kinds of Mathematical Induction

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

7.1 Convergence of sequences of random variables

0, otherwise. EX = E(X 1 + X n ) = EX j = np and. Var(X j ) = np(1 p). Var(X) = Var(X X n ) =

Random Variables, Sampling and Estimation

Quick Review of Probability

Chapter 1. Probability

Quick Review of Probability

Advanced Stochastic Processes.

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 6 9/24/2008 DISCRETE RANDOM VARIABLES AND THEIR EXPECTATIONS

MATH301 Real Analysis (2008 Fall) Tutorial Note #7. k=1 f k (x) converges pointwise to S(x) on E if and

Discrete-Time Systems, LTI Systems, and Discrete-Time Convolution

2.4 Sequences, Sequences of Sets

Random Walks in the Plane: An Energy-Based Approach. Three prizes offered by SCM

Lecture 2. The Lovász Local Lemma

MAT1026 Calculus II Basic Convergence Tests for Series

STA 348 Introduction to Stochastic Processes. Lecture 1

PH 425 Quantum Measurement and Spin Winter SPINS Lab 1

Here are some examples of algebras: F α = A(G). Also, if A, B A(G) then A, B F α. F α = A(G). In other words, A(G)

4x 2. (n+1) x 3 n+1. = lim. 4x 2 n+1 n3 n. n 4x 2 = lim = 3

1 Generating functions for balls in boxes

Problem 4: Evaluate ( k ) by negating (actually un-negating) its upper index. Binomial coefficient

Statistics 511 Additional Materials


If a subset E of R contains no open interval, is it of zero measure? For instance, is the set of irrationals in [0, 1] is of measure zero?

( ) = p and P( i = b) = q.

2 Geometric interpretation of complex numbers

Estimation for Complete Data

Homework 1 Solutions. The exercises are from Foundations of Mathematical Analysis by Richard Johnsonbaugh and W.E. Pfaffenberger.

Section 4.3. Boolean functions

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

Chapter 6 Infinite Series

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 19

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables

4. Partial Sums and the Central Limit Theorem

sin(n) + 2 cos(2n) n 3/2 3 sin(n) 2cos(2n) n 3/2 a n =

subcaptionfont+=small,labelformat=parens,labelsep=space,skip=6pt,list=0,hypcap=0 subcaption ALGEBRAIC COMBINATORICS LECTURE 8 TUESDAY, 2/16/2016

Exercises 1 Sets and functions

STAT Homework 1 - Solutions

Math 140A Elementary Analysis Homework Questions 1

Integrable Functions. { f n } is called a determining sequence for f. If f is integrable with respect to, then f d does exist as a finite real number

L = n i, i=1. dp p n 1

Chapter 4. Fourier Series

1. By using truth tables prove that, for all statements P and Q, the statement

1 Games in Normal Form (Strategic Form)

Math 220A Fall 2007 Homework #2. Will Garner A

FIR Filters. Lecture #7 Chapter 5. BME 310 Biomedical Computing - J.Schesser

Chapter 1 : Combinatorial Analysis

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Lectures 1 5 Probability Models

Ma 530 Infinite Series I

CHAPTER 5. Theory and Solution Using Matrix Techniques

Sequences, Series, and All That

n=1 a n is the sequence (s n ) n 1 n=1 a n converges to s. We write a n = s, n=1 n=1 a n

Linear Regression Demystified

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

Chapter 6 Sampling Distributions

Chapter 6 Principles of Data Reduction

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Discrete probability distributions

Continuous Functions

First, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So,

Alternating Series. 1 n 0 2 n n THEOREM 9.14 Alternating Series Test Let a n > 0. The alternating series. 1 n a n.

Section 11.8: Power Series

CSE 1400 Applied Discrete Mathematics Number Theory and Proofs

Binomial Distribution

The Boolean Ring of Intervals

Transcription:

Axioms for Probability Logic Pb ( a ) = measure of the plausibility of propositio b coditioal o the iformatio stated i propositio a For propositios a, b ad c: P: Pb ( a) 0 P2: Pb ( a& b ) = P3: Pb ( a) + P(~ b a) = P4: Pc ( & b a) = Pc ( b& a) Pb ( a) Properties Derived from Axioms P-P4 P5: (a) P(~ b a& b) = 0, (b) P( b a) [0,] Proof: (a) Follows from P3 with a ( a& b) & the usig P2 (b) Follows from P3 & P: Pb ( a) = P(~ b a) P: (a) Pc ( or b a) = Pc ( a) + Pb ( a) Pc ( & b a) (b) Pc ( or b a) = Pc ( a) + Pb ( a) if a implies that ( c& b ) is false, i.e. b ad c caot both be true ad so are mutually exclusive. (c) If propositio a implies that oly oe of the propositios b, b2,, b ca be true, i.e. they are mutually exclusive, the: Pb ( or b2 or or b a) = Pb ( a) = If, i additio, propositio a implies that oe must be true: Proof: (a) From De Morga s Law: c or b ~ (~ c& ~ b), so: Pc ( or b a) = P(~ (~ c& ~ b) a) P3 P(~ c&~ b a) P4 P(~ c ~ b& a) P(~ b a) = Pb ( a) =

P3 [ P( c ~ b& a)] P(~ b a) P4 P(~ b a) + P( c&~ b a) P3,4 P( b a) + P(~ b c& a) P( c a) P3 P( b a) + [ P( b c& a)] P( c a) P4 P( b a) + P( c a) P( b& c a) (b) Follows immediately sice Pc ( & b a ) = 0 (c) Exercise: Use Priciple of Mathematical Iductio P7: If propositio a implies that oe, ad oly oe, of the propositios b,, b is true, the: Proof: (a) (b) Pc ( a) = Pc ( & b a) [Margializatio Theorem] = = Pc ( a) = Pc ( b & a) Pb ( a) [Total Probability Theorem] (c) For =,, : Pb ( c& a) = = Pc ( b & a) Pb ( a) [Bayes Theorem] Pc ( b & a) Pb ( a) (a) Let b b or b2 or or b, the sice a implies b is true, a = a& b, so from P2: P4 Pb ( a ) = ad Pc ( & b a) Pc ( b& a) Pb ( a) = Pc ( a), so Pc ( a) = Pc ( & ( b or or b ) a) = P(( c& b) or or ( c& b ) a) Sice a implies that b ad b m caot both be true if m, it must be that ( c& b )&( c& b ) = c&( b & b ) is false, i.e. ( c& b ) ad ( c& b ) are mutually m m exclusive. From P ( c ) with = b c& b : P( c a) P( c& b a) = (b) Directly from (a) usig P4. (c) From (b), the deomiator o RHS is Pc ( a ), so we eed oly show that Pb ( c& a) Pc ( a) = Pc ( b & a) Pb ( a). This follows from P4 sice both are equal to Pb ( & c a) = Pc ( & b a). m

Derivatio of Kolmogorov s Axioms for Probability Measure of a Set Recall that the axioms for probability logic are expressed i terms of propositios a, b, ad c as: P: Pb ( a) 0 P2: Pb ( a& b ) = P3: Pb ( a) + P(~ b a) = P4: Pc ( & b a) = Pc ( b& a) Pb ( a) These axioms imply the property: P: Pc ( or b a) = Pc ( a) + Pb ( a) Pc ( & b a) Cosider a real-valued quatity whose value x is ucertai but we ow that x X, the set of possible values of the quatity, ad assume that X is fiite. Let π deote the propositio that specifies the probability model for the quatity, so π states that x X ad gives the probability (degree of plausibility) of the quatity havig the value x for each x X. Let A X, the from axiom P: K: Px ( Aπ ) 0 From axiom P2: K2: Px ( X π ) = (sice π states x X) Also, Px ( A B π ) = Px ( Aor x B π ) = Px ( A π ) + Px ( B π) Px ( A& x B π) = Px ( A π ) + Px ( B π) Px ( A B π) If AB, X are disjoit, i.e. A B= φ (ullset), the x A ad x B are mutually exclusive, so: K3: Px ( A B π ) = Px ( A π) + Px ( B π) Itroduce the shorteed otatio PA ( ) for Px ( Aπ ) where A X, i.e. leave coditioig o π as implicit, the we ca rewrite the above as: K : PA ( ) 0, A X K2 : PX ( ) = K3 : PA ( B) = PA ( ) + PB ( ), AB, Xwith A B= φ These are lie A. Kolmogorov s axioms i Foudatios of the Theory of Probability, 950. I fact, we could defie a real-valued fuctio P, called a probability measure, o

all the subsets of X by: PA ( ) = Px ( A π ), A X. We would the have exactly Kolmogorov s axioms K -K3 i terms of his probability measure. I Kolmogorov s presetatio, the set fuctio PA ( ) is ot give a operatioal iterpretatio. Probability logic gives it a iterpretatio i terms of the degree of plausibility that the ucertai quatity has a value x A, coditioal o the probability model specified by the propositio π. We have ot yet applied axiom P4, but: Px ( A B π ) = Px ( A& x B π ) = Px ( A x B& π ) Px ( B π ) Itroduce the abbreviated otatio agai, the: PA ( B) = PA ( BPB ) ( ) ote that i ay probability Pb ( a ), propositio a caot be self-cotradictory, so we caot have: PB ( ) = Px ( Bπ ) = 0 because this meas that π ~( x B) ad so the propositio ( x B& π ) appearig as coditioig iformatio would be self-cotradictory. Thus, PB ( ) > 0 ad so: PA ( B) PAB ( ) = PB ( ) I Kolmogorov s approach, this appears as a defiitio of coditioal probability, but i the probability logic approach, all probabilities are coditioal from the outset ad so the correspodig result appears as axiom P4. ote: I Kolmogorov s approach, the ucertai-valued variable x is called a radom variable, the set of X of possible values of x is called the sample space ad the subsets of X are called evets. We have little use for this termiology because probability logic has a much wider scope it s domai is propositios. Also, we try to avoid the vague words determiistic ad radom i aalyzig systems ad atural pheomea, usually preferrig istead complete iformatio ad icomplete (or partial) iformatio, respectively. Whe there is missig iformatio, we choose a probability model to give the probability of each possibility i a set of coceived possibilities.

Probability Models for Discrete Variables Def If the set X of possible values of a variable x is fiite, the the variable is discrete (Termiology is also used if X is coutably ifiite, but we here we focus o fiite X ). Covetio: Itroduce the shorteed otatio: PA ( π ) = Px ( A π ), A X, i.e. thi of A as also represetig the propositio that x A. Also, write: Px ( ) for P( x ), x X, A= x. π { } π i.e. whe subset { } Here, π is a propositio specifyig the probability model for the quatity whose ucertai value x is a discrete variable. Thus, π specifies the set X of possible values of x ad some fuctio f : X [0,] such that Px ( π ) = f( x), x X. Let X { x,, x } = { x } ad let A X, the by repeated use of K3: = because sigleto sets { } { } P( A π ) = P( x π ) x A = Px ( π ) = f( x) x A x A x ad { xm}, m, are disjoit (i.e. propositios x = x ad x = x m are mutually exclusive). I particular, usig K2:, x X = = PX ( π) = Px ( π) = f( x) so this ormalizatio property must be satisfied by the probability model.

Example to Illustrate Cocepts Suppose a gambler offers you a wager that i rolls of a die, a appears more tha some specified o times. Before acceptig the wager, you wat to compute the probability that you will wi, so you mae a stochastic predictive aalysis. Let x deote the umber of s i rolls, the you wat to compute: P(You wi π ) = P( x o π ) where π specifies o appropriate probability model. You reaso that o a roll, each face is equally plausible to appear (i.e. you see o reaso to believe ay umber o the die to be more liely to show up). Also, you feel that owig what umber appeared o a roll has o ifluece o what will appear o the ext roll, i.e. the iformatio from oe roll is irrelevat to predictig what will happe o aother roll. Therefore, you choose the biomial model, so π represets the propositio that the set of possible values for x is X = 0,,, ad that: { } 5 ( x )( ) ( ) x x P( x π ) =, x X where ( )! = x ( x )! x (a combiatorial coefficiet).! x A 0,, o X ad so P(You wi π )=P(A π)= P(x π). You wi if { } After computig this probability, you are told that the gambler is dishoest ad may have a altered die. He showed you oly oe face ad it was a, so you ow woder whether oe or more of the other five faces have a. You decide to revise your predictive aalysis as follows. [To mae this sceario plausible, assume you oly see the top face o each roll] Let π specify the biomial probability model for the case where exactly faces o the die have a, the for =,.., : P( x ) ( ) x x x, x X whereθ = is the probability of rollig a six. [ote that for π, all faces are s, so x= with certaity, i.e. P( x π ) = δ x, the Kroecer delta]. Sice you are usure which propositio π, =,...,, is the appropriate oe to assume true, you choose a probability model for these propositios: P π M = g θ, =,..., ( ) ( ) π = θ ( θ ) where M is a propositio specifyig each π ad your choice of probability model, g ( θ ). ow you calculate: ( ) P(You wi M) P( A M) P x M = = ad use the Total Probability Theorem P7(b) to get: x A x A

( ) ( π & ) ( π ) P x M = P x M P M = because M implies that oe, ad oly oe, of π,..., π is true. ote that the first factor is the predictio of the th model ad the secod factor is the probability of the th model. Also, P( x π & M) = P( x π ) because π states the probability of each x X ad hece M is irrelevat, i.e. P( x π K ) is idepedet of M. otes: ) Whe we write P( c b& a) P( c a) =, we are statig that, give a, the iformatio stated by b is irrelevat to the probability of c, i.e. probabilistic idepedece is about iformatio idepedece ad should ot be cofused with causal idepedece whe the propositios refer to the occurrece of actual evets. 2) There are may possible choices for M. You may feel that each π is equally plausible, so: P( π M) =, =,...,. But you may feel that the gambler is uliely to have a die with o every face because you would get suspicious if every roll gave a, so you choose P( π M) to be a decreasig fuctio of, subject to the costrait: = P ( π M) = Your calculated probability of wiig, ad hece your decisio whether to wager with the gambler, is coditioal o your choice of M ad this is iescapable. You caot get certaity i the choice of π i the absece of iformatio about how may faces of the die have a. You could tae differet choices for M, say M j, j =,... J, ad let M specify a probability model for them, i.e. PM ( M) = h( j), j=,..., J, the: J ( ) ( j) ( j ) P x M = P x M P M M [Total Probability Theorem] j= 3) π specifies a probability model for x: P( x π ) = f( x), x X ad M specifies a probability model for the π : ( π ) ( θ ) P M = g, =,.... j

However, itroducig the otatio of ( ) ad ( ) because we ca use P( x π ) ad P( M) f x g θ is really uecessary π istead i our aalysis. Usig Data to Update the Probability of Probability Models We use the gamblig example to examie how we ca use data D to lear more about the probability models. Suppose the gambler says after 2 rolls givig a 3 ad a that if you wat, you ca icrease your bet. Therefore, you wish to calculate how iformatio from data D= { x=, = 2} chages your probability of wiig, based o your choice of the class of probability models for x specified by M, as before. With A' {0,,..., o }: = P( You wi D, M) P( A' D, M) P( x' D, M) = = x' A' where the shorthad D, M meas D & M (a commo covetio) ad x ' = umber of s i (-2) rolls of the die is x '. From the Total Probability Theorem: P x' D, M = P x' π P π D, M ( ) ( ) ( ) = 2 x' because P( x' π, D, M) = P( x' π ) = ( ) θ ( θ ) x ' 2 x' (i.e. D ad M are irrelevat because π specifies the probability of the same form as before except that the origial probability P( M) to P( π D, M) because of the ew iformatio specified by D. Applyig Bayes Theorem P7(c) (with b = π, c= Dad a= M) : P π D, M = c P D π, M P π, M, =,... x ' ). This has π is updated ( ) ( ) ( ) where c = P( D π, M) P( π M) (it esures that P( π D M) = 2 P( D π, M) = θ ( θ) P M = g, =,..., ( π ) ( θ ) Thus, Bayes Theorem gives: P π D, M = 2cθ θ g θ ( ) ( ) ( ) [ Posterior to data] [ Ifluece of data] [ Prior to data], = ) ad:

[ote that P ( D M ) π, 0 =, as expected, sice you ow ow that ot all faces of the die have a ]. Substitutig: ( ' D, M) P x We say that ( ', ) = 5 = 2 x' + θ θ θ x ' 5 θ θ θ = x' ( ) g ( ) ( ) g ( ) P x D M is the posterior predictive probability ad ( ) P x M is the prior predictive probability, meaig after ad before, respectively, the iformatio i the data is utilized.