Lecture 18: Noise modeling and introduction to decision theory

Size: px

Start display at page:

Download "Lecture 18: Noise modeling and introduction to decision theory"

Beryl Boone
5 years ago
Views:

1 Lecture 8: oise modeling and introduction to decision theory Learning Objectives: Hypothesis testing The receiver operator characteristic (ROC curve Bayes s Theorem, positive and negative predictive value (PPV, PV Basic principles of supervised networks Supplementary Slides: obability. Supplementary Reading: Fundamental of eural etworks. I. Hypothesis Testing A. Reference (or Parent distribution. Typically a Gaussian distribution assuming sufficient sampling of an unbiased random variable: B. Sampled distribution has some measured mean, x = s = i= ( x x. x i i=, and standard deviation,

2 Lecture 8: oise modeling and introduction to decision theory C. Test or experiment must define a threshold for significance.. Typically state a ull hypothesis: H o. H o esume that the sampled distribution and the reference distribution are one and the same. Only reject the ull hypothesis when the sample mean, x, exceeds the significance level of the test, α. For a two-tailed test, the threshold could be met by x being greater than or less than the threshold. Thus, the value of α defines the probabilities falsely rejecting the ull hypothesis: ( x α = α ( x α = p( α p( (a and b For a one-tailed test, ( α = x p( ( α D. Type I and Type II errors. Type I error The probability that H o is rejected if it is correct - defined by Eqs. a and b. Type II error The probability that H o is accepted when it is false. eed to identify the underlying parent distribution of the sampled, or measured, data: β α = p ( Type II error. Power -β; a measure of our ability to resolve two distinct underlying distributions.

3 Lecture 8: oise modeling and introduction to decision theory D. The error function: x y erf ( x = e dy ; For a normalized Gaussian, then perform a change of variable to a new π variable, z, such that, z = x µ. σ The distribution of z is zero mean with standard deviation, σ =. Then for the one-tailed case, erf ( z α α = e π Type I error α ( β = π y e z z y = erf ( z d Type II error = erf(z Power = β = erf ( d z dy = α dy = α d. α ( x > z ( x > z α α

4 Lecture 8: oise modeling and introduction to decision theory II. Receiver operator characteristic (ROC curve A. Constructing the curve. (n and (s: k o = d in units of standard deviations k t, or k-threshold = Z -α Type I error obability of detecting signal when signal is absent = P(S n =-( n= - Specificity =-erf(k t = erf(-k t. Type II error obability of detecting noise when signal is present =P( s=β =-(S s= - Sensitivity Sensitivity = - β = Power = erf ( k o k t 3. Sensitivity and Specificity. Sensitivity Specificity TP True Positive Fraction = = ( S s TP + F T True egative fraction = = ( n T + FP ( S n + ( n = ( S s + ( s =

5 Lecture 8: oise modeling and introduction to decision theory 4. Positive and egative edictive Values. Postive edictive Value The probability that a patient has the disease given a positive test results = ( s S egative edictive Value The probability that a patient has the disease given a positive test results = ( n Use Bayes s Theorem: ( S s ( s ( s S = = PPV. ( S s ( s + ( S n ( n III. Introduction to etworks.. eural etworks: Concept based on basic features of biological nervous systems. Interconnecting identical nodes, or processing elements (PE s. The PE s are organized into layers. umber of PE s per layer is a design choice based on how many inputs are desired and their correlation with each other. Interconnections between PE s is also a design choice. The final layer is the output layer and other layers are hidden layers. from (Zhilouchian, ocessing Elements Output, O, of each PE: T [ X W ] = g x w = g[ S] O = g i= O is a scaler, X is the input vector, and W is the weight vector associated with a given PE. The weights are adjusted by the training process. g[. ] is a nonlinear activation function, gnerally with a sigmoid dependence, as shown above in the Figure. from Zhilouchian s chapter. i i

Lecture 8: oise modeling and introduction to decision theory During training, feature vectors of known objects are presented in random order to the network.

6 Lecture 8: oise modeling and introduction to decision theory During training, feature vectors of known objects are presented in random order to the network. The interconnection weights of the PE s are adjusted for each iteration of the training. sumably, the training converges on a set of weights that identifies the appropriate set of measures and their relative importance to making the decision, i.e. malignant or benign in the cancer setting. 3. Bayesian etworks Weighting factors derived from the predictive value of a set of given variables. Recognition that many factors contribute to diagnosis: (Burnside et al, A Bayesian etwork for Mammography

The Bayes Theorema. Converting pre-diagnostic odds into post-diagnostic odds. Prof. Dr. F. Vanstapel, MD PhD Laboratoriumgeneeskunde UZ KULeuven

The Bayes Theorema. Converting pre-diagnostic odds into post-diagnostic odds. Prof. Dr. F. Vanstapel, MD PhD Laboratoriumgeneeskunde UZ KULeuven slide 1 The Bayes Theorema Converting pre-diagnostic odds into post-diagnostic odds Prof. Dr. F. Vanstapel, MD PhD Laboratoriumgeneeskunde UZ KULeuven slide 2 Problem * The yearly incidence of TBC infections