Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Similar documents
Bayesian Inference: Posterior Intervals

CS 361: Probability & Statistics

CS 361: Probability & Statistics

Bayesian Models in Machine Learning

Bayesian Inference. STA 121: Regression Analysis Artin Armagan

Computational Perception. Bayesian Inference

Review: Statistical Model

Probability and Estimation. Alan Moses

CSC321 Lecture 18: Learning Probabilistic Models

COMP90051 Statistical Machine Learning

Introduction to Probabilistic Machine Learning

Some Asymptotic Bayesian Inference (background to Chapter 2 of Tanner s book)

STAT 425: Introduction to Bayesian Analysis

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

Computational Cognitive Science

Beta statistics. Keywords. Bayes theorem. Bayes rule

Bayesian Methods for Machine Learning

10. Exchangeability and hierarchical models Objective. Recommended reading

Foundations of Statistical Inference

Unobservable Parameter. Observed Random Sample. Calculate Posterior. Choosing Prior. Conjugate prior. population proportion, p prior:

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Basic notions of probability theory

Bayesian Statistics. Debdeep Pati Florida State University. February 11, 2016

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018

Computational Cognitive Science

Chapter 5. Bayesian Statistics

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Point Estimation. Vibhav Gogate The University of Texas at Dallas

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

Classical and Bayesian inference

ECE521 W17 Tutorial 6. Min Bai and Yuhuai (Tony) Wu

Bayesian Inference. p(y)

Computational Biology Lecture #3: Probability and Statistics. Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept

Time Series and Dynamic Models

Accouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF

Basic notions of probability theory

Bayesian Inference. Introduction

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Bayesian RL Seminar. Chris Mansley September 9, 2008

Bayesian Inference. Chapter 2: Conjugate models

Variability within multi-component systems. Bayesian inference in probabilistic risk assessment The current state of the art

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference

A Brief Review of Probability, Bayesian Statistics, and Information Theory

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

A Very Brief Summary of Bayesian Inference, and Examples

7. Estimation and hypothesis testing. Objective. Recommended reading

Data Analysis and Uncertainty Part 2: Estimation

Bayesian Learning (II)

Compute f(x θ)f(θ) dθ

Bayesian analysis in nuclear physics

MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Classical and Bayesian inference

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation

(1) Introduction to Bayesian statistics

Estimation of Quantiles

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Contents. Decision Making under Uncertainty 1. Meanings of uncertainty. Classical interpretation

Hierarchical Models & Bayesian Model Selection

Bayesian Analysis for Natural Language Processing Lecture 2

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Advanced Herd Management Probabilities and distributions

Bayesian philosophy Bayesian computation Bayesian software. Bayesian Statistics. Petter Mostad. Chalmers. April 6, 2017

Statistics: Learning models from data

A Discussion of the Bayesian Approach

Stat 5101 Lecture Notes

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

Lecture 23 Maximum Likelihood Estimation and Bayesian Inference

Chapter 4. Bayesian inference. 4.1 Estimation. Point estimates. Interval estimates

Overview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58

Chapter 9: Interval Estimation and Confidence Sets Lecture 16: Confidence sets and credible sets

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1

Modeling Environment

Conjugate Priors, Uninformative Priors

A primer on Bayesian statistics, with an application to mortality rate estimation

Lecture 2: Conjugate priors

Introduction to Machine Learning

Probability Review - Bayes Introduction

Comparison of Bayesian and Frequentist Inference

Discrete Binary Distributions

1 Introduction. P (n = 1 red ball drawn) =

an introduction to bayesian inference

Probability Distributions

Lecture 16. Lectures 1-15 Review

Lecture 2: Priors and Conjugacy

Introduction to Bayesian Methods

The Random Variable for Probabilities Chris Piech CS109, Stanford University

CS540 Machine learning L9 Bayesian statistics

Bayesian Regression (1/31/13)

COS513 LECTURE 8 STATISTICAL CONCEPTS

Parametric Techniques Lecture 3

Statistical Theory MT 2007 Problems 4: Solution sketches

ST 740: Model Selection

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Statistical Methods in Particle Physics

NPFL108 Bayesian inference. Introduction. Filip Jurčíček. Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic

Transcription:

Estimation of reliability parameters from Experimental data (Parte 2)

This lecture Life test (t 1,t 2,...,t n ) Estimate θ of f T t θ For example: λ of f T (t)= λe - λt Classical approach (frequentist probability definition) Bayesian Approach (subjective probability definition)

The Bayesian Subjective Probability Framework P(E K) is the degree of belief of the assigner with regard to the occurrence of E (numerical encoding of the state of knowledge K - of the assessor) When P(E) can be considered objective from the scientific point of view? De Finetti: objective=coherent: It uses total body of knowledge It complies with theory of probability Bayes Theorem to update the probability assignment in light of new data = = = n j j j i i i i i E P E A P E P E A P A P E P E A P A E P 1 ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( Old new data Updated

The Bayesian Subjective Probability Framework Rare events Frequentist school: we cannnot associate a probabilty to them E.g. a pump of a NPP, constant failure rate, λ, 10000 hours of operation, 0 failures መλ MLE = k = 0 TTT h 1 Bayesian School: we can associate a probability to them based on expert judgment, and then, as evidence is collected, we can update the probability using the Bayes Theorem Repeatibility: Bayesian school: two assessors can be coherent and still disagree

The Bayesian approach to parameter estimation θ = parameter of the failure time distribution, f T (t; θ) θ is a random quantity (epistemic uncertainty) Assessor provides a probability distribution of θ based on its knowledge, experience, : P(θ) = Prior distribution (subjective probability) experimental evidence When a sample of failure times E = t 1, t 2,, t n becomes available, the estimate of θ is updated by using the bayes theorem: P(θ E) = P(θ) P(E θ) P(E) = Posterior distribution

Bayes Formula P(θ E) = P(θ) P(E θ) P(E) P E θ = L θ likelihood of the evidence E P E = P E θ i P θ i i Theorem of Total Probability න θ P E θ P θ dθ P θ E = k P θ L(θ)

Observations Aleatory uncertainty on the failure time: t P t θ Example: the failure time distribution is an exponential distribution t~f T t λ = λe λt Epistemic uncertainty on the parameter value, conditional on the background knowledge K (expert judgment, experimental data, ): θ P θ K the epistemic uncertainty can be updated through Bayes theorem as the evidence increases, the background knowledge K improves and the epistemic uncertainty reduces

Comparing Bayesian and frequentist approaches (parameter estimation) parameter inference Source of Information Frequentist fixed, unknown number ad hoc estimation methods (e.g. MLE) Experimental data Bayesian random variable Bayesian updating, logical extension of the theory of probability Expert judgment + Experimental data

Exercise 1: Bayes Theorem You feel that the frequency of heads, θ, on tossing a particular coin is either 0.4, 0.5 or 0.6. Your prior probabilities are: P θ 1 = 0.4 = 0.1 P θ 2 = 0.5 = 0.7 P θ 3 = 0.6 = 0.2 You toss the coin just once and the toss results is tail: E = tail Questions: 1. Update the probability of θ 2. Consider the denominator of Bayes theorem and interpret it.

Exercise 1: Solution (I) We apply the Bayes theorem to revise the prior probabilities after the evidence E that the toss results in tail. P(θ i E) = P(θ i ) P(E θ i) P(E) Likelihood: P E θ i = P tail θ i = 1 θ i : P E θ 1 = 0.4 = 1 0.4 = 0.6 P E θ 2 = 0.5 = 1 0.5 = 0.5 P E θ 3 = 0.6 = 1 0.6 = 0.4 3 P(E) = σ i=1 Posterior: P(θ i ) P E θ i = 0.1 0.6 + 0.7 0.55 + 0.2 0.4 = 0.49 = 0.49 P θ 1 = 0.4 = P θ 1 P E θ 1 P E P θ 2 = 0.5 = P θ 2 P E θ 2 P E P θ 3 = 0.6 = P θ 3 P E θ 3 P E = 0.1 0.6 0.49 = 0.1224 = 0.7 0.5 0.49 = 0.7143 = 0.2 0.4 0.49 = 0.1633

Exercise 1: Solution (II) The denominator: P E = σ3 i=1 P θ i P E θ i can be interpreted as the probability of a result in tails in a single trial based on our prior knowledge.

Exercise 2 Suppose that a production manager is concerned about the items produced by a certain manufacturing process. More specifically, he is concerned about the proportion of these items that are defective. From past experience with the process, he feels that, the proportion of defectives, can take only four possible values: 0.01, 0.05, 0.10 and 0.25. Moreover, he has observed the process and he has some information concerning. This information can be summarized in terms of the following probabilities that constitute the production manager s prior distribution of : P( = 0.01) = 0.60 P( = 0.05) = 0.30 P( = 0.10) = 0.08 P( = 0.25) = 0.02 The production manager assumes that the process can be thought of as a Bernoulli process, with the assumption of stationarity and independence appearing reasonable. That is, the probability that only one item is defective remains constant for all items produced and is independent of the past history of defectives from the process. A sample of n = 5 items is taken from the production process, and k = 1 of the 5 is found to be defective. How can this information be combined with the prior information?

Exercise 2: Solution Solution The sampling distribution of the number of defectives in 5 trials, given any particular value of, is a binomial distribution. The likelihoods are thus: P k 5 1 5 1 5 1 5 1 4 ( = 1 n = 5, = 0.01) = (0.01) (0.99) = 0.0480 P k 4 ( = 1 n = 5, = 0.05) = (0.05) (0.95) = 0.2036 P k 4 ( = 1 n = 5, = 0.10) = (0.10) (0.90) = 0.3280 P k 4 ( = 1 n = 5, = 0.25) = (0.25) (0.75) = 0.3955 Bayes theorem can be written in the form: P(θ i E) = P(θ i ) P(E θ i) P(E) where t 1 represents the sample result, 1 defective in 5 trials.

Exercise 2: Solution The posterior probabilities are summarized in the following Table:

Bayesian approach: continuous updating of the parameter distribution E n 1 = t 1, t 2, t n 1 P θ E n 1 already updated (it will be our prior for next updating) t n = new evidence E n = {t 1, t 2,, t n } P θ E n 1, t n = P θ, E n 1, t n P(E n 1, t n ) = P θ, t n E n 1 P(E n 1 ) P(E n 1, t n ) = = P t n θ, E n 1 P θ E n 1 P E n 1 P E n 1, t n = P θ E n 1 P t n θ P t n E n 1 P t n E n 1 = න P θ E n 1 P t n θ dθ Independent from the previous E n 1

Bayesian approach: coherence Multiple stage updating t 1 t 2 t n P θ P θ t Bayesian 1 P θ t 1, t 2 P θ t 1,, t n 1 P θ t 1,, t Bayesian Bayesian n updating updating updating Single stage updating t 1, t 2,, t n P θ Bayesian updating P θ t 1,, t n

Bayesian approach: coherence Multiple stage updating t 1 t 2 t n P θ P θ t Bayesian 1 P θ t 1, t 2 P θ t 1,, t n 1 P θ t 1,, t Bayesian Bayesian n updating updating updating Single cumulative updating t 1, t 2,, t n Same evidence P θ Bayesian updating P θ t 1,, t n Same a posteriori

Updating on t n+2 after t n+1 (multiple stage updating) P θ E n+2 = P θ E n+1 with: P t n+2 θ P t n+2 E n+1 P t n+2 E n+1 = P t n+2 E n, t n+1 = P t n+2, t n+1 E n P t n+1 E n Conditional probability P θ E n+1 = P θ E n P t n+1 θ P t n+1 E n First updating P θ E n+2 = P θ E n+1 P t n+2 θ P t n+2 E n+1 = P θ E n P t n+1 θ P t n+1 E n P t n+2 θ P t n+2, t n+1 E n P t n+1 E n P θ E n+2 = P θ E n P t n+1 θ P t n+2 θ P t n+2, t n+1 E n = P θ E n P t n+1, t n+2 θ P t n+2, t n+1 E n same as a single cumulative updating!

Bayesian approach: some observations on the updating process P θ E P(θ) P(E θ) Posterior Prior Likelihood In correspondence of values of ϑ for which both prior and likelihood are small the posterior will be small bulk of the posterior where prior and likelihood are not negligible If the prior is very sharp (strong prior evidence), it will not change much unless the evidence is very strong Which of the two prior will be more influenced by the evidence t? Posterior depends on the relative strength of prior and likelihood

Bayesian approach to parameter estimation: Large evidence - Example

Normal distribution as a limit of the binomial distribution (θ = 0. 1)

Bayesian approach to parameter estimation: Large evidence - Example

Likelihood behavior when n, assuming n/k = 0. 1

Bayesian approach to parameter estimation: Large evidence - Example

Bayesian approach to parameter estimation: Large evidence Evidence becomes stronger and stronger The likelihood tends to a delta function The posterior tends to a delta as well, centered around the only value which is now the true value (perfect knowledge) The classical and bayesian statistics become identical in the results (not conceptually)

Confidence intervals vs Credible intervals Classical statistics: a 90% confidence interval means that there is a 0.9 probability that the interval contains the parameter which is a fixed value, although unknown. ˆ Sample 1 n=100 =0.9 2 Sample 2 n=100 =0.9 2 Sample 10 n=100 =0.9 2 1 1 1 Bayesian statistics: the parameter is a random variable with a given distribution and the 90% credible interval tells me that right now, with my current knowledge, I am 90% confident that the true value (which I will discover when I gain perfect knowledge) will fall within these bounds.

Confidence intervals vs Credible intervals Classical statistics: Confidence intervals capture the uncertainty about the interval we have obtained (i.e., whether it contains the true value or not). Thus, they cannot be interpreted as a probabilistic statement about the true parameter values. Bayesian statistics: Credible intervals capture our current uncertainty in the location of the parameter values and thus can be interpreted as probabilistic statement about the parameter

Conjugate distributions The likelihood L(θ) and the prior f(θ) are called conjugate distributions is the posterior π(θ E) is in the same family of the prior distribution Example: Likelihood: binomial distribution Posterior = Beta distribution (different parameters) Prior = Beta Distribution(q,r)

Conjugate distributions The likelihood L(θ) and the prior f(θ) are called conjugate distributions is the posterior π(θ E) is in the same family of the prior distribution Example: Likelihood: binomial distribution Posterior = Beta distribution (different parameters) Prior = Beta Distribution

Bayesian approach to parameter estimation: Families of conjugate distributions Conjugate distributions characteristics: posterior prior with updated parameters estimates simple analytical (mean and variance)

Exercise 3: failure rate of a motor-driven pump of a NPP Assume that the failure rate of the pump is constant, λ Assume the following three types of relevant information on this machine: E1: engineering knowledge (description of the design and construction of the pump) E2: past performance of similar pumps in similar plants = 310 h ' 5 1 = 7.410 h ' 5 1 E3: performance of the specific machine = 0 failures in t = 1000h Questions: 1) Use E1 and E2 to build the prior of the failure rate distribution, P λ 2) Update the prior using the information in E3. Determine the point estimator of λ and its 95 percentile

Exercise 3: failure rate of a motor-driven pump of a NPP - Solution To choose the family of the prior P λ, I observe that E3 can be seen as the result of a poisson experiment: 0 occurrence in 1000h: k ( t) t Pk failures in [0, t] = e Poisson process k! The conjugate distribution to the poisson is the Gamma with two parameters α, β : P ' 1 ' ' = (, ) = ' ( ) ' ' e ' ' ' ' ' = = ' ' ' I model the prior as a Gamma function with parameters α, β obtained from: α β = ഥλ = 3 10 5 α = σ β λ = 7.4 10 5 β α = λ 2 σ 2 = 0.1644 λ = λ σ 2 = 5478 λ

Exercise 3: failure rate of a motor-driven pump of a NPP - Solution Gamma and Poisson are conjugate Posterior is still a Gamma distribution with parameters '' '' ' = + k = ' = + t = 0.1644 6478 '' '' 5 1 = = 2.510 h '' λ = α = 2.5 10 5 β λ 95 = 1.4 10 4 P( k, t) P( k, t)

Exercise 3: failure rate of a motor-driven pump of a NPP - Solution Frequentist statistics: ˆ k 0 (2k + 2) (2) 5.99 0h 3 10 h t 1000 2t 2000h 2000h 2 2 1 F 95 95 3 1 MLE = = = 95 = = = =

Bayesian Vs Frequentist for large amount of data As already pointed out, the results of the Bayesian and classical analyses converge with large amounts of data. The influence of the prior parameters α, β decreases. '' '' '' ' + k k = = = ˆ '' ' MLE for kt, + t t '' ' + k = = 0 '' ' + t Thus for large amounts of data, the posterior distribution will be highly peaked around the MLE estimate ˆ = MLE k t It can also be shown that the Bayesian and frequentist 95 percentiles will converge; one should, however, keep in mind the differences: BAYESIAN = analyst s subjective uncertainty concerning the value of the random variable λ FREQUENTIST = variability in the estimation of λ (true value) Finally, note that as the evidence increases our state of knowledge on the parameter increases and at perfect knowledge ( evidence) the uncertainty on its value is zero: however, the failure process remains inherently aleatory.