Introduction to Probabilistic Reasoning

Size: px

Start display at page:

Download "Introduction to Probabilistic Reasoning"

Julie Hall
5 years ago
Views:

1 Introduction to Probabilistic Reasoning inference, forecasting, decision Giulio D Agostini giulio.dagostini@roma.infn.it Dipartimento di Fisica Università di Roma La Sapienza Part 3 Probability is good sense reduced to a calculus (Laplace) G. D Agostini, Probabilistic Reasoning [3 (Stellenbosch, 23 November 203) c G. D Agostini p.

2 Contents Summary of relevant formulae Propagating uncertainty the straight way Simple case of the Gaussian distribution : when (and why) ML provides a good estimates and why it can never tell something meaningfull about uncertainties the role of conjugate priors, with application to the Bernoulli trials ( binomial problem ): beta pdf. Handling systematics, with the special case of two measurements with Gaussian response of the same detector, affected by a possible systematic error: how the overall uncertainty increases how the two results become correlated Examples with OpenBUGS (see web page) [ JAGS G. D Agostini, Probabilistic Reasoning [3 (Stellenbosch, 23 November 203) c G. D Agostini p. 2

3 Important formulae (For continous uncertain variables) product rule: f xy (x,y I) = f x y (x y,i)f y (y I) independence: f xy (x,y I) = f x (x I)f y (y I) marginalizzation: f xy (x,y I)dy = f x (x I) f xy (x,y I)dx = f y (y I) decomposition: f x (x I) = f x y (x y,i)f y (y I)dy f y (y I) = f y x (y x,i)f x (x I)dx weighted average G. D Agostini, Probabilistic Reasoning [3 (Stellenbosch, 23 November 203) c G. D Agostini p. 3

4 Important formulae continued Bayes rule (using observation x and true value µ and simplifying the notation) f(µ x,i) = (µ,x I) f(x I) = f(x µ,i)f(µ I) f(x I) = f(x µ,i)f(µ I) f(x µ,i)f(µ I)dµ f(x µ,i)f(µ I) (In the following I will be implicit in most cases) G. D Agostini, Probabilistic Reasoning [3 (Stellenbosch, 23 November 203) c G. D Agostini p. 4

5 Propagation of uncertainties What has to do with Bayesian? G. D Agostini, Probabilistic Reasoning [3 (Stellenbosch, 23 November 203) c G. D Agostini p. 5

6 Propagation of uncertainties What has to do with Bayesian? Think what most of you have been taught in laboratory courses (typically): Use probabilistic formulae for object that are not random variables (in the frequentistic approach) G. D Agostini, Probabilistic Reasoning [3 (Stellenbosch, 23 November 203) c G. D Agostini p. 5

7 Propagation of uncertainties What has to do with Bayesian? Think what most of you have been taught in laboratory courses (typically): Use probabilistic formulae for object that are not random variables (in the frequentistic approach) An insult to logic!! G. D Agostini, Probabilistic Reasoning [3 (Stellenbosch, 23 November 203) c G. D Agostini p. 5

8 Propagation of uncertainties What has to do with Bayesian? Think what most of you have been taught in laboratory courses (typically): Use probabilistic formulae for object that are not random variables (in the frequentistic approach) An insult to logic!! In the Bayesian approach it is straightforward because true values are uncertain variables. G. D Agostini, Probabilistic Reasoning [3 (Stellenbosch, 23 November 203) c G. D Agostini p. 5

9 Propagation of uncertainties What has to do with Bayesian? Think what most of you have been taught in laboratory courses (typically): Use probabilistic formulae for object that are not random variables (in the frequentistic approach) An insult to logic!! In the Bayesian approach it is straightforward because true values are uncertain variables. For details see 2005 CERN lectures (nr. 4, pp. 7-28) conferencedisplay.py?confid=a04375 G. D Agostini, Probabilistic Reasoning [3 (Stellenbosch, 23 November 203) c G. D Agostini p. 5

10 back to our slides a very simple inference in a gaussian model conjugate priors systematics G. D Agostini, Probabilistic Reasoning [3 (Stellenbosch, 23 November 203) c G. D Agostini p. 6

11 Simple case of Gaussian errors x N(µσ): f(x µ,i) = f(µ x,i) = IF f(µ I) const f(µ x,i) = E[µ, σ(µ), etc. 2πσ 2σ 2 2πσ 2σ 2 f(µ I) + 2πσ 2σ 2 f(µ I) dµ [ (µ x)2 2πσ 2σ 2. G. D Agostini, Probabilistic Reasoning [3 (Stellenbosch, 23 November 203) c G. D Agostini p. 7

12 Simple case of Gaussian errors x N(µσ): f(x µ,i) = f(µ x,i) = IF f(µ I) const f(µ x,i) = E[µ, σ(µ), etc. 2πσ 2σ 2 2πσ 2σ 2 f(µ I) + 2πσ 2σ 2 f(µ I) dµ [ (µ x)2 2πσ 2σ 2. maximum of posterior = maximum of likelihood G. D Agostini, Probabilistic Reasoning [3 (Stellenbosch, 23 November 203) c G. D Agostini p. 7

13 Simple case of Gaussian errors x N(µσ): f(x µ,i) = f(µ x,i) = IF f(µ I) const f(µ x,i) = E[µ, σ(µ), etc. 2πσ 2σ 2 2πσ 2σ 2 f(µ I) + 2πσ 2σ 2 f(µ I) dµ frequentistic equivalent of σ(µ)? [ (µ x)2 2πσ 2σ 2. G. D Agostini, Probabilistic Reasoning [3 (Stellenbosch, 23 November 203) c G. D Agostini p. 7

14 Simple case of Gaussian errors x N(µσ): f(x µ,i) = f(µ x,i) = IF f(µ I) const f(µ x,i) = E[µ, σ(µ), etc. 2πσ 2σ 2 2πσ 2σ 2 f(µ I) + 2πσ 2σ 2 f(µ I) dµ [ (µ x)2 2πσ 2σ 2. frequentistic equivalent of σ(µ)? NO! G. D Agostini, Probabilistic Reasoning [3 (Stellenbosch, 23 November 203) c G. D Agostini p. 7

15 Simple case of Gaussian errors x N(µσ): f(x µ,i) = f(µ x,i) = IF f(µ I) const f(µ x,i) = E[µ, σ(µ), etc. 2πσ 2σ 2 2πσ 2σ 2 f(µ I) + 2πσ 2σ 2 f(µ I) dµ [ (µ x)2 2πσ 2σ 2. frequentistic equivalent of σ(µ)? NO! prescriptions!! G. D Agostini, Probabilistic Reasoning [3 (Stellenbosch, 23 November 203) c G. D Agostini p. 7

16 Prior conjugate Since ever there have been computational problems: survive with approximations e.g. Gaussian approximation of the posterior (somehow a reflex of the Central Limit theorem) G. D Agostini, Probabilistic Reasoning [3 (Stellenbosch, 23 November 203) c G. D Agostini p. 8

17 Prior conjugate Since ever there have been computational problems: survive with approximations e.g. Gaussian approximation of the posterior (somehow a reflex of the Central Limit theorem) Another famous approximation is to chose a proper shape for the prior: compromize between real beliefs and easy math! G. D Agostini, Probabilistic Reasoning [3 (Stellenbosch, 23 November 203) c G. D Agostini p. 8

18 Prior conjugate Since ever there have been computational problems: survive with approximations e.g. Gaussian approximation of the posterior (somehow a reflex of the Central Limit theorem) Another famous approximation is to chose a proper shape for the prior: compromize between real beliefs and easy math! inferring p of binomial black/green board G. D Agostini, Probabilistic Reasoning [3 (Stellenbosch, 23 November 203) c G. D Agostini p. 8

19 Beta distribution X Beta(r,s)): f(x Beta(r,s)) = β(r,s) xr ( x) s Denominator is just normalization, i.e. { r, s > 0 0 x. β(r,s) = 0 x r ( x) s dx. beta function, resulting in β(r,s) = Γ(r)Γ(s) Γ(r+s). very flexible distribution file beta_distribution.pdf G. D Agostini, Probabilistic Reasoning [3 (Stellenbosch, 23 November 203) c G. D Agostini p. 9

20 Uncertainties due to systematics This is another subjects where the frequentistic approach fails miserably: no consistent theory, just prescriptions. add them linearly 2. add the quadratically 3. do if small, 2 if large;... G. D Agostini, Probabilistic Reasoning [3 (Stellenbosch, 23 November 203) c G. D Agostini p. 0

21 Uncertainties due to systematics This is another subjects where the frequentistic approach fails miserably: no consistent theory, just prescriptions. add them linearly 2. add the quadratically 3. do if small, 2 if large;... Straightforward in the bayesian approach: influence quantities from which the final results may depend (calibrations constants, etc.) are characterized by an uncertain value, and hence we can attach to them probabilities, and use probability theory. (This is what, in practice, also physicists wh think to adhere to fquentistic school finally do, as in error propagation.) G. D Agostini, Probabilistic Reasoning [3 (Stellenbosch, 23 November 203) c G. D Agostini p. 0

22 . Conditional inference Let s make esplicit one of the pieces of information: f(µ x 0,I) = f(µ x 0,h,I 0 ) f(µ x o ) f(µ x o,h) true value µ o µ f(x µ o,h) f(x µ o ) observed value x o x G. D Agostini, Probabilistic Reasoning [3 (Stellenbosch, 23 November 203) c G. D Agostini p.

23 . Conditional inference Let s make esplicit one of the pieces of information: f(µ x 0,I) = f(µ x 0,h,I 0 ) then use a probability theory f(µ x 0,I) = f(µ x 0,h,I 0 )f(h I 0 )dh Easy, logical, it does work try it! G. D Agostini, Probabilistic Reasoning [3 (Stellenbosch, 23 November 203) c G. D Agostini p.

24 2. Joint inference + marginalization We can infer both µ and h from the data (One piece of data, two inferred quantities? So what? they will just be 00% correlated) f(µ,h x 0,I) f(x 0 µ,h,i) f 0 (µ,h I) And then apply marginalization with respect to the so called nuisance parameter f(µ x 0,I) = f(µh x 0,I 0 )f(h I 0 )dh G. D Agostini, Probabilistic Reasoning [3 (Stellenbosch, 23 November 203) c G. D Agostini p. 2

25 3. Raw result corrected result As third possibility, we might think at a raw result obtained at a fixed value of h (its nominal, best value) only affected by uncertainties due to random errors (or all others, a part those coming from hour imperfect knpwledge about h f(µr x 0,I) Then think to a corrections due to all possible values of h, consistently with out best knowledge: [g(µ R,h) is not a pdf! µ = g(µ R,h) then use propagation of uncertainties Example: µ = µ R +z, where z is an offset known to be 0±σ z one of the assigned problems G. D Agostini, Probabilistic Reasoning [3 (Stellenbosch, 23 November 203) c G. D Agostini p. 3

26 Common systematic correlation As well understood, if measurements have a systematic effect in common, the results will become correlated. As it happens when two quantities depend from a third one: µ = µ R +z µ 2 = µ R2 +z common uncertainty in z affects µ and µ 2 in the same direction: µ and µ 2 will become positively correlated. assigned problem For a more detailed example, using reasoning 2 see file em common_systematics.pdf (sections ) G. D Agostini, Probabilistic Reasoning [3 (Stellenbosch, 23 November 203) c G. D Agostini p. 4

27 Conclusions G. D Agostini, Probabilistic Reasoning [3 (Stellenbosch, 23 November 203) c G. D Agostini p. 5

28 Conclusions Take the courage to use it! G. D Agostini, Probabilistic Reasoning [3 (Stellenbosch, 23 November 203) c G. D Agostini p. 5

29 Conclusions It s easy if you try...! G. D Agostini, Probabilistic Reasoning [3 (Stellenbosch, 23 November 203) c G. D Agostini p. 5

Bayesian Models in Machine Learning

Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of