Probability theory: elements Peter Antal antal@mit.bme.hu A.I. February 17, 2017 1
Joint distribution Conditional robability Indeendence, conditional indeendence Bayes rule Marginalization/Exansion Chain rule Exectation, variance
Basic element: random variable Similar to roositional logic: ossible worlds defined by assignment of values to random variables. Boolean random variables e.g., Cavity do I have a cavity? Discrete random variables e.g., Weather is one of <sunny,rainy,cloudy,snow> Domain values must be exhaustive and mutually exclusive Elementary roosition constructed by assignment of a value to a random variable: e.g., Weather = sunny, Cavity = false abbreviated as cavity Comlex roositions formed from elementary roositions and standard logical connectives e.g., Weather = sunny Cavity = false
Atomic event: A comlete secification of the state of the world about which the agent is uncertain E.g., if the world consists of only two Boolean variables Cavity and Toothache, then there are 4 distinct atomic events: Cavity = false Toothache = false Cavity = false Toothache = true Cavity = true Toothache = false Cavity = true Toothache = true Atomic events are mutually exclusive and exhaustive
For any roositions A, B 0 PA 1 Ptrue = 1 and Pfalse = 0 PA B = PA + PB - PA B
Prior or unconditional robabilities of roositions e.g., PCavity = true = 0.1 and PWeather = sunny = 0.72 corresond to belief rior to arrival of any new evidence Probability distribution gives values for all ossible assignments: PWeather = <0.72,0.1,0.08,0.1> normalized, i.e., sums to 1 Joint robability distribution for a set of random variables gives the robability of every atomic event on those random variables PWeather,Cavity = a 4 2 matrix of values: Weather = sunny rainy cloudy snow Cavity = true 0.144 0.02 0.016 0.02 Cavity = false 0.576 0.08 0.064 0.08
Conditional or osterior robabilities e.g., Pcavity toothache = 0.8 i.e., given that toothache is all I know Notation for conditional distributions: PCavity Toothache = 2-element vector of 2-element vectors If we know more, e.g., cavity is also given, then we have Pcavity toothache,cavity = 1 New evidence may be irrelevant, allowing simlification, e.g., Pcavity toothache, sunny = Pcavity toothache = 0.8 This kind of inference, sanctioned by domain knowledge, is crucial
Definition of conditional robability: Pa b = Pa b / Pb if Pb > 0 Product rule gives an alternative formulation: Pa b = Pa b Pb = Pb a Pa A general version holds for whole distributions, e.g., PWeather,Cavity = PWeather Cavity PCavity View as a set of 4 2 equations, not matrix mult. Chain rule is derived by successive alication of roduct rule: PX 1,,X n = PX 1,...,X n-1 PX n X 1,...,X n-1 = PX 1,...,X n-2 PX n-1 X 1,...,X n-2 PX n X 1,...,X n-1 = = π i= 1^n PX i X 1,,X i-1
Every question about a domain can be answered by the joint distribution. Start with the joint robability distribution: For any roosition φ, sum the atomic events where it is true: Pφ = Σ ω:ω φ Pω
Start with the joint robability distribution: For any roosition φ, sum the atomic events where it is true: Pφ = Σ ω:ω φ Pω Ptoothache = 0.108 + 0.012 + 0.016 + 0.064 = 0.2
Start with the joint robability distribution: For any roosition φ, sum the atomic events where it is true: Pφ = Σ ω:ω φ Pω Ptoothache = 0.108 + 0.012 + 0.016 + 0.064 = 0.2
Start with the joint robability distribution: Can also comute conditional robabilities: Pcavity toothache = Pcavity toothache Ptoothache = 0.016+0.064 0.108 + 0.012 + 0.016 + 0.064 = 0.4
Denominator can be viewed as a normalization constant α PCavity toothache = α, PCavity,toothache = α, [PCavity,toothache,catch + PCavity,toothache, catch] = α, [<0.108,0.016> + <0.012,0.064>] = α, <0.12,0.08> = <0.6,0.4> General idea: comute distribution on query variable by fixing evidence variables and summing over hidden variables
Tyically, we are interested in the osterior joint distribution of the query variables Y given secific values e for the evidence variables E Let the hidden variables be H = X - Y E Then the required summation of joint entries is done by summing out the hidden variables: PY E = e = αpy,e = e = ασ h PY,E= e, H = h The terms in the summation are joint entries because Y, E and H together exhaust the set of random variables Obvious roblems: 1. Worst-case time comlexity Od n where d is the largest arity 2. Sace comlexity Od n to store the joint distribution 3. How to find the numbers for Od n entries?
A and B are indeendent iff PA B = PA or PB A = PB or PA, B = PA PB PToothache, Catch, Cavity, Weather = PToothache, Catch, Cavity PWeather 32 entries reduced to 12; for n indeendent biased coins, O2 n On Absolute indeendence owerful but rare A and B are conditionally indeendent iff PA B = PA or PB A = PB or PA, B C = PA C PB C
Model Model Data Data Model X X X Y X X Y Y X X Y Y X An algebraic triviality A scientific research aradigm A ractical method for inverting causal knowledge to diagnostic tool. Cause Cause Effect Effect Cause
Probability theory=measure theory+indeendence I P X;Y Z or X Y Z P denotes that X is indeendent of Y given Z: PX;Y z=py z PX z for all z with Pz>0. Almost alternatively, I P X;Y Z iff PX Z,Y= PX Z for all z,y with Pz,y>0. Other notations: D P X;Y Z =def= I P X;Y Z Contextual indeendence: for not all z. Homeworks: Intransitivity: show that it is ossible that DX;Y, DY;Z, but IX;Z. order : show that it is ossible that IX;Z, IY;Z, but DX,Y;Z.
Joint robability distribution secifies robability of every atomic event. Queries can be answered by summing over atomic events.