Parameter Learning With Binary Variables

Size: px
Start display at page:

Download "Parameter Learning With Binary Variables"

Transcription

1 With Binary Variables University of Nebraska Lincoln CSCE 970 Pattern Recognition

2 Outline Outline 1 Learning a Single Parameter 2 More on the Beta Density Function 3 Computing a Probability Interval

3 Outline Outline 4 Learning Parameters in a Bayesian Network 5 Learning with Missing Data Items 6 Variances in Computed Relative Frequencies

4 Outline Learning a Single Parameter More on the Beta Density Function Computing a Probability Interval Probability Distributions of Relative Frequencies Learning a Relative Frequency 1 Learning a Single Parameter Probability Distributions of Relative Frequencies Learning a Relative Frequency 2 More on the Beta Density Function 3 Computing a Probability Interval

5 Learning a Single Parameter More on the Beta Density Function Computing a Probability Interval Probability Distributions of Relative Frequencies Learning a Relative Frequency Equally Probable Relative Frequencies The Urn example can be modeled by the following Bayesian network: P(f) = 1/ f 1.00 F Side P(Side = heads f) = f

6 Learning a Single Parameter More on the Beta Density Function Computing a Probability Interval Review of Gamma Function Probability Distributions of Relative Frequencies Learning a Relative Frequency Γ(x) = 0 t x 1 e 1 dt The integral converges if and only if x > 0. If x is an integer 1, it can be shown that Γ(x) = (x 1)!

7 Learning a Single Parameter More on the Beta Density Function Computing a Probability Interval Introducing the Beta Density Function Probability Distributions of Relative Frequencies Learning a Relative Frequency The beta density function with parameters a, b, N = a + b, where a and b are real numbers > 0, is ρ(f) = Γ(N) Γ(a)Γ(b) f a 1 (1 f) b 1 0 f 1 A beta density function is denoted beta(f; a, b).

8 Learning a Single Parameter More on the Beta Density Function Computing a Probability Interval Example Beta Function Plots Probability Distributions of Relative Frequencies Learning a Relative Frequency beta(f; 50, 50) beta(f; 3, 3) beta(f; 18, 2) Note that the larger the values of a and b, the more mass is concentrated around a/(a + b).

9 Expected Value Learning a Single Parameter More on the Beta Density Function Computing a Probability Interval Probability Distributions of Relative Frequencies Learning a Relative Frequency If F has a beta distribution with parameters a, b, N = a + b, then E(F) = a N We assume our beliefs such that P(X = 1 f) = f. In this case we have P(X = 1) = E(F) = a N

10 Urn Revisited Learning a Single Parameter More on the Beta Density Function Computing a Probability Interval Probability Distributions of Relative Frequencies Learning a Relative Frequency Probability that the first coin chosen lands heads is 0.5 What if we toss it 20 times and it lands heads 18? We now feel the coin is closer to.90 than.10 How to quantify such a change in belief?

11 Learning a Single Parameter More on the Beta Density Function Computing a Probability Interval Independence of Trials Probability Distributions of Relative Frequencies Learning a Relative Frequency ρ(f) F If we know the value f, then each trial of X denoted X (h) is independent from all trials X (i), i h X P(X = 1 F = f) = f

12 Learning a Single Parameter More on the Beta Density Function Computing a Probability Interval Probability of a Data Set Probability Distributions of Relative Frequencies Learning a Relative Frequency Suppose we have a data set d = {x (1), x (2),..., x (M) }. Let s be the number of variables in d equal to 1, and t be the number of variables in d equal to 2. Then P(d) = E(F s (1 F) t ) If F has a beta distribution, it can be shown that E(F s (1 F) t ) = Γ(N) Γ(a + s)γ(b + t) Γ(N + M) Γ(a)Γ(b) Therefore P(d) = Γ(N) Γ(a + s)γ(b + t) Γ(N + M) Γ(a)Γ(b)

13 Urn Example Learning a Single Parameter More on the Beta Density Function Computing a Probability Interval Probability Distributions of Relative Frequencies Learning a Relative Frequency Recall the urn example. ρ(f) = beta(f; 1, 1). Consider the binomial sample d = {1, 2}. We have a = b = 1, N = 2, s = 1, t = 1, M = 2. Thus, P(d) = Γ(2) Γ(1 + 1)Γ(1 + 1) = 1 Γ(2 + 2) Γ(1)Γ(1) 6 Now consider the sample d = {1, 1}. We have P(d ) = 1/3 Why is the probability of two heads twice the probability of one heads and one tails?

14 Learning a Single Parameter More on the Beta Density Function Computing a Probability Interval Updating Parameter Density Function Probability Distributions of Relative Frequencies Learning a Relative Frequency Given a data set d and original density function ρ(f) = beta(f; a, b), the updated density function is given by ρ(f) = beta(f; a + s, b + t) The probability that the next trial is equal to 1, denoted P(X (M+1) = 1 d), is equal to E(F d). Assuming F has a beta distribution with parameters a, b, N = a + b this becomes P ( ) X (M+1) = 1 d = a + s N + M

15 Learning a Single Parameter More on the Beta Density Function Computing a Probability Interval Probability Distributions of Relative Frequencies Learning a Relative Frequency Example of Updating a Density Function Consider the thumbtack example with density function ρ(f) = beta(f; 3, 3) and a sample data set d = {1, 1, 2, 1, 1, 1, 1, 1, 2, 1} Our updated density function becomes ρ(f d) = beta(f; 3 + 8, 3 + 2) = beta(f; 11, 5) Our probability that the next trial will produce 1 becomes ( ) X (11) = 1 d = =

16 Outline Learning a Single Parameter More on the Beta Density Function Computing a Probability Interval 1 Learning a Single Parameter 2 More on the Beta Density Function 3 Computing a Probability Interval

17 Review Learning a Single Parameter More on the Beta Density Function Computing a Probability Interval Our parameters F ij are random variables with a beta distribution denoted beta(f; a, b) P(X 1 = 1 f) = f Expected value E(F) = a/n where N = a + b Data set probability P(d) = Γ(N) Γ(a + s)γ(b + t) Γ(N + M) Γ(a)Γ(b) Updated density function ρ(f d) = beta(f; a + s, b + t) Updated probability P(X (M+1) = 1 d) = a + s N + M

18 Learning a Single Parameter More on the Beta Density Function Computing a Probability Interval Scrutinizing the Updated Probability Recall that we update probability from a data set d as follows: P(X (M+1) = 1 d) = a + s N + M We use the uniform distribution given by beta(f; 1, 1) in order to model a lack of belief about the true probabilities of X. Why do we use the updated function beta(f; 1 + s, 1 + t) rather than just beta(f; s, t) to model our belief once we have concrete trials?

19 Learning a Single Parameter More on the Beta Density Function Computing a Probability Interval Avoiding Overly-Confident Beliefs Consider the urn example. Suppose we sample a coin at random and flip it. If it lands heads we have the data set d = {1} If we update our belief based solely on the concrete trial we would have ρ(f d) = beta(f; 1, 0) This would give an updated probability P(X (2) = 1 d) = 1/1 = 100%. Updating our belief as normal we instead have ρ(f d) = beta(f; 2, 1) leading to the updated probability P(X (2) = 1 d) = 2/ %.

20 Learning a Single Parameter More on the Beta Density Function Computing a Probability Interval Beta Function With 0 < a < 1, 0 < b < 1 Relative frequency of one of the two values is very low Not sure which value Belief quickly overwhelmed by data beta(f; 0.2, 0.2)

21 Learning a Single Parameter More on the Beta Density Function Computing a Probability Interval Assessing the Values of a and b The following are guidelines for choosing values of a and b. a = b = 1: Belief that we have no knowledge at all of the value of the relative frequency. a, b > 1: Belief that it is probable that the relative frequency that X = 1 is around a/(a + b). The larger the values of a and b, the greater the belief. a, b < 1: Belief that the relative frequency that X = 1 is either very high or very low, but we are not sure which.

22 Outline Learning a Single Parameter More on the Beta Density Function Computing a Probability Interval 1 Learning a Single Parameter 2 More on the Beta Density Function 3 Computing a Probability Interval

23 Learning a Single Parameter More on the Beta Density Function Computing a Probability Interval Motivation for Probability Interval We saw earlier that P(X = 1 f) = f. How confidant are we that the true probability is near f? We measure this confidence by finding a value c such that P(f (E(F) c, E(F) + c)) = perc where (E(F) c, E(F) + c) is an interval, known as a probability interval, such that 100(perc)% of the area under the beta curve is within that interval.

24 Learning a Single Parameter More on the Beta Density Function Computing a Probability Interval Computing a Probability Interval A perc% probability interval for E(F) is found by solving the following equation for c: E(F)+c E(F) c ρ(f) df = perc

25 Example Learning a Single Parameter More on the Beta Density Function Computing a Probability Interval Recall the updated density function we computed for the thumbtack example: ρ(f) = beta(f; 11, 5) and E(F) = = To find a 95% probability interval we solve the following equation for c: c c Γ(16) Γ(11)Γ(5) f 10 (1 f) 4 df = 0.95 We obtain the solution c = which gives a probability interval ( , ) = (0.474, 0.902)

26 Example Learning a Single Parameter More on the Beta Density Function Computing a Probability Interval Suppose we have a = 31 and b = 1, leading to E(F) = Solving the equation c Γ(32) Γ(31)Γ(1) f 30 (1 f) 0 df = c we find c = This leads to a probability interval (0.936, 1.002). Re-compute c from the following equation: c Γ(32) Γ(31)Γ(1) f 30 (1 f) 0 df = 0.95 We now obtain c = and probability interval ( , 1) = (0.908, 1).

27 Learning a Single Parameter More on the Beta Density Function Computing a Probability Interval General Probability Intervals If c is computed, and (E(F) c, E(F) + c) (0, 1) and E(F) > 0.5, solve the following equation for c: 1 E(F) c ρ(f) df = perc The probability interval is then (E(F) c, 1). If (E(F) c, E(F) + c) (0, 1) and E(F) < 0.5, solve the following equation for c: E(F)+c 0 ρ(f) df = perc The probability interval is then (0, E(F) + c).

28 Outline Learning Parameters in a Bayesian Network Learning with Missing Data Items Variances in Computed Relative Frequencies Urn Examples Learning Using an Augmented Bayesian Network A Problem with Updating; Using an Equivalent Sample Size 4 Learning Parameters in a Bayesian Network Urn Examples Learning Using an Augmented Bayesian Network A Problem with Updating; Using an Equivalent Sample Size 5 Learning with Missing Data Items 6 Variances in Computed Relative Frequencies

29 Learning Parameters in a Bayesian Network Learning with Missing Data Items Variances in Computed Relative Frequencies Urn Revisited Urn Examples Learning Using an Augmented Bayesian Network A Problem with Updating; Using an Equivalent Sample Size Consider two identical urns. We sample and toss one coin from each urn beta(f 11; 1, 1) F 11 beta(f 21; 1, 1) F X 1 P(X 1 = 1 f 11) = f 11 X 2 P(X 2 = 1 f 21) = f 21 X 1 X 2 P(X 1 = 1) = 1/2 P(X 2 = 1) = 1/2

30 Learning Parameters in a Bayesian Network Learning with Missing Data Items Variances in Computed Relative Frequencies Example: Joint Probabilities Urn Examples Learning Using an Augmented Bayesian Network A Problem with Updating; Using an Equivalent Sample Size Since the two nodes X 1 and X 2 are independent, we have the following joint probabilities: ( )( ) 1 1 P(X 1 = 1, X 2 = 1) = P(X 1 = 1)P(X 2 = 1) = = ( )( ) 1 1 P(X 1 = 1, X 2 = 2) = P(X 1 = 1)P(X 2 = 2) = = ( )( ) 1 1 P(X 1 = 2, X 2 = 1) = P(X 1 = 2)P(X 2 = 1) = = ( )( ) 1 1 P(X 1 = 2, X 2 = 2) = P(X 1 = 2)P(X 2 = 2) = = These are NOT relative frequencies. They are our beliefs concerning the first outcome.

31 Learning Parameters in a Bayesian Network Learning with Missing Data Items Variances in Computed Relative Frequencies Example: Updated Values Urn Examples Learning Using an Augmented Bayesian Network A Problem with Updating; Using an Equivalent Sample Size Case X 1 X Given the data for the first 7 trials we have updated density functions ρ(f 11 d) = beta(f 11 ; 1 + 4, 1 + 3) = beta(f 11 ; 5, 4) ρ(f 21 d) = beta(f 21 ; 1 + 5, 1 + 2) = beta(f 21 ; 6, 3) Thus we now have joint distributions ««5 2 P(X 1 = 1, X 2 = 1) = = ««4 2 P(X 1 = 2, X 2 = 1) = = ««5 1 P(X 1 = 1, X 2 = 2) = 9 3 ««4 1 P(X 1 = 2, X 2 = 2) = 9 3 = 5 27 = 4 27

32 Learning Parameters in a Bayesian Network Learning with Missing Data Items Variances in Computed Relative Frequencies Example: Three Urns Urn Examples Learning Using an Augmented Bayesian Network A Problem with Updating; Using an Equivalent Sample Size Suppose we have three urns u 1, u 2, u 3. We sample a coin from u 1 and flip it. If it turns up heads we sample a coin from u 2 and flip it, else we sample a coin from u 3 and flip it. This situation can be modeled by the following Bayesian network: beta(f 11; 1, 1) beta(f 21; 1, 1) beta(f 22; 1, 1) F 11 F 21 F 22 X 1 X 2 P(X 1 = 1 f 11) = f 11 P(X 2 = 1 X 1 = 1, f 21) = f 21 P(X 2 = 1 X 1 = 2, f 22) = f 22

33 Learning Parameters in a Bayesian Network Learning with Missing Data Items Variances in Computed Relative Frequencies Three Urns Joint Probabilities Urn Examples Learning Using an Augmented Bayesian Network A Problem with Updating; Using an Equivalent Sample Size The previous augmented Bayesian network contains the following embedded network: X 1 X 2 P(X 1 = 1) = 1/2 P(X 2 = 1 X 1 = 1) = 1/2 P(X 2 = 1 X 1 = 2) = 1/2 This network has the following joint probabilities: P(X 1 = 1, X 2 = 1) = P(X 2 = 1 X 1 = 1)P(X 1 = 1) = ( ) ( ) 1 1 = P(X 1 = 1, X 2 = 2) = 1 4, P(X 1 = 2, X 2 = 1) = 1 4, P(X 1 = 2, X 2 = 2) = 1 4

34 Learning Parameters in a Bayesian Network Learning with Missing Data Items Variances in Computed Relative Frequencies Three Urns Updated Values Urn Examples Learning Using an Augmented Bayesian Network A Problem with Updating; Using an Equivalent Sample Size Case X 1 X Given the data for the first 7 trials we have updated density functions ρ(f 11 d) = beta(f 11 ; 1 + 4, 1 + 3) = beta(f 11 ; 5, 4) ρ(f 21 d) = beta(f 21 ; 1 + 3, 1 + 1) = beta(f 21 ; 4, 2) ρ(f 22 d) = beta(f 22 ; 1 + 2, 1 + 1) = beta(f 22 ; 3, 2) Thus we now have joint distributions ««2 5 P(X 1 = 1, X 2 = 1) = = ««3 4 P(X 1 = 2, X 2 = 1) = = ««1 5 P(X 1 = 1, X 2 = 2) = 3 9 ««2 4 P(X 1 = 2, X 2 = 2) = 5 9 = 5 27 = 8 45

35 Learning Parameters in a Bayesian Network Learning with Missing Data Items Variances in Computed Relative Frequencies Augmented Bayesian Networks Consider the Three Urn example: Urn Examples Learning Using an Augmented Bayesian Network A Problem with Updating; Using an Equivalent Sample Size beta(f 11; 1, 1) beta(f 21; 1, 1) beta(f 22; 1, 1) F 11 F 21 F 22 X 1 X 2 P(X 1 = 1 f 11) = f 11 P(X 2 = 1 X 1 = 1, f 21) = f 21 P(X 2 = 1 X 1 = 2, f 22) = f 22 Gobal and local parameter independence implies ρ(f 11, f 12,..., f nqn ) = ρ(f 11 )ρ(f 12 ) ρ(f nqn )

36 Learning Parameters in a Bayesian Network Learning with Missing Data Items Variances in Computed Relative Frequencies Binomial Bayesian Network Sample Urn Examples Learning Using an Augmented Bayesian Network A Problem with Updating; Using an Equivalent Sample Size Suppose we have a M random vectors X (1) = X (1) 1. X (1) n X (2) = and the random vector set X (2) 1. X (2) n D = {X (1), X (2),, X (M) } X (M) = such that, for every i, each X (h) i has the space {1, 2} X (M) 1. X (M) n

37 Learning Parameters in a Bayesian Network Learning with Missing Data Items Variances in Computed Relative Frequencies Urn Examples Learning Using an Augmented Bayesian Network A Problem with Updating; Using an Equivalent Sample Size Binomial Bayesian Network Sample Continued Suppose also that there is a binomial Bayesian network (G, F,ρ), where G = (V, E), such that, for 1 h M {X (h) 1,...,X(h) n } constitutes an instance of V in G resulting in a distinct augmented Bayesian network. Then the random vector set D is called a binomial Bayesian network sample. F 11 F 21 F 22 X (1) 1 X (1) 2 X (2) 1 X (2) 2

38 Learning Parameters in a Bayesian Network Learning with Missing Data Items Variances in Computed Relative Frequencies General Data Set Probability Urn Examples Learning Using an Augmented Bayesian Network A Problem with Updating; Using an Equivalent Sample Size Suppose we have a random vector set D and a set of data of the X (h) s as follows: x (1) x x (1) = 1. (2) x x (2) = 1. (M) x (M) = 1. x (1) n x (2) n x (M) n d = {x (1), x (1),...,x (M) } Suppose also that s ij is the number of x (h) i the number of x (h) i equal to 2. equal to 1, and t ij is Then we have the data set probability n q i Γ(N ij ) Γ(a ij + s ij )Γ(b ij + t ij ) P(d) = Γ(N ij + M ij ) Γ(a ij )Γ(b ij ) i=1 j=1

39 Learning Parameters in a Bayesian Network Learning with Missing Data Items Variances in Computed Relative Frequencies General Updated Density Function Urn Examples Learning Using an Augmented Bayesian Network A Problem with Updating; Using an Equivalent Sample Size Suppose again that we have a random vector set D and a set of data of the X (h) s as follows: x (1) x x (1) = 1. (2) x x (2) = 1. (M) x (M) = 1. x (1) n x (2) n x (M) n d = {x (1), x (1),...,x (M) } and that s ij is the number of x (h) i number of x (h) i equal to 2. equal to 1, and t ij is the Assuming each F ij has original beta distribution, we have the updated density function ρ(f ij d) = beta(f ij ; a ij + s ij, b ij + t ij )

40 Learning Parameters in a Bayesian Network Learning with Missing Data Items Variances in Computed Relative Frequencies Problem Overview Urn Examples Learning Using an Augmented Bayesian Network A Problem with Updating; Using an Equivalent Sample Size beta(f 11 ; 1, 1) beta(f 21 ; 1, 1) beta(f 22 ; 1, 1) F 11 F 21 F 22 X 1 X 2 P(X 1 = 1) = 1/2 P(X 2 = 1 X 1 = 1) = 1/2 P(X 2 = 1 X 1 = 2) = 1/2 Prior experience of seeing X 1 taking the value 1 once in two trials Prior experience of seeing X 2 taking the value 1 out of the two times that X 1 took the value 1

41 Learning Parameters in a Bayesian Network Learning with Missing Data Items Variances in Computed Relative Frequencies Prior Equivalent Sample Size Urn Examples Learning Using an Augmented Bayesian Network A Problem with Updating; Using an Equivalent Sample Size To solve this problem we specify the same prior sample size at each node. beta(f 11 ; 2, 2) beta(f 21 ; 1, 1) beta(f 22 ; 1, 1) F 11 F 21 F 22 X 1 X 2 P(X 1 = 1) = 1/2 P(X 2 = 1 X 1 = 1) = 1/2 P(X 2 = 1 X 1 = 2) = 1/2

42 Learning Parameters in a Bayesian Network Learning with Missing Data Items Variances in Computed Relative Frequencies General Equivalent Sample Size Urn Examples Learning Using an Augmented Bayesian Network A Problem with Updating; Using an Equivalent Sample Size Given an augmented binomial Bayesian network with beta density functions for all i and j, if there is a number N equiv such that, for all i and j N ij = a ij + b ij = P(pa ij ) N equiv then the network has an equivalent sample size N equiv.

43 Learning Parameters in a Bayesian Network Learning with Missing Data Items Variances in Computed Relative Frequencies Equivalent Sample Size Example Urn Examples Learning Using an Augmented Bayesian Network A Problem with Updating; Using an Equivalent Sample Size beta(f 11; 10, 5) beta(f 21; 9, 6) F 11 F 21 X 1 X 2 beta(f 31; 2, 4) F 31 beta(f 33; 2, 1) F 33 beta(f 32; 3, 1) F 32 X 3 beta(f 34; 1, 1) F 34 N equiv = 15

44 Learning Parameters in a Bayesian Network Learning with Missing Data Items Variances in Computed Relative Frequencies Urn Examples Learning Using an Augmented Bayesian Network A Problem with Updating; Using an Equivalent Sample Size Constructing Equivalent Sample Size Bayes Nets Given an augmented Bayesian network, we can assign for all i and j a ij = b ij = N 2q i This provides an equal probability of each value at each node. Given a Bayesian network, the parameters F ij can be assigned values a ij = P(X i = 1 pa ij ) P(pa ij ) N equiv b ij = P(X i = 2 pa ij ) P(pa ij ) N equiv This provides an augmented Bayesian network that embeds the given Bayesian network.

45 Learning Parameters in a Bayesian Network Learning with Missing Data Items Variances in Computed Relative Frequencies Expressing Prior Indifference Urn Examples Learning Using an Augmented Bayesian Network A Problem with Updating; Using an Equivalent Sample Size Recall the following augmented Bayesian network: beta(f 11 ; 2, 2) beta(f 21 ; 1, 1) beta(f 22 ; 1, 1) F 11 F 21 F 22 X 1 X 2 P(X 1 = 1) = 1/2 P(X 2 = 1 X 1 = 1) = 1/2 P(X 2 = 1 X 1 = 2) = 1/2 Though this network has equivalent sample size, it no longer models our indifference to true parameter value.

46 Learning Parameters in a Bayesian Network Learning with Missing Data Items Variances in Computed Relative Frequencies Expressing Prior Indifference Urn Examples Learning Using an Augmented Bayesian Network A Problem with Updating; Using an Equivalent Sample Size We can solve this problem by using an equivalent sample size N equiv = 2 to describe indifference in a Bayesian network. Using the previous example, this would give us a new network beta(f 11 ; 1, 1) beta(f 21 ;.5,.5) beta(f 22 ;.5,.5) F 11 F 21 F 22 X 1 X 2 P(X 1 = 1) = 1/2 P(X 2 = 1 X 1 = 1) = 1/2 P(X 2 = 1 X 1 = 2) = 1/2

47 Outline Learning Parameters in a Bayesian Network Learning with Missing Data Items Variances in Computed Relative Frequencies Data Items Missing at Random 4 Learning Parameters in a Bayesian Network 5 Learning with Missing Data Items Data Items Missing at Random 6 Variances in Computed Relative Frequencies

48 Learning Parameters in a Bayesian Network Learning with Missing Data Items Variances in Computed Relative Frequencies Review of Updating Data Items Missing at Random Suppose we have the following Bayesian network beta(f 11; 2, 2) beta(f 21; 1, 1) beta(f 22; 1, 1) F 11 F 21 F 22 X 1 X 2 Case X 1 X P(X 1 = 1) = 1/2 P(X 2 = 1 X 1 = 1) = 1/2 P(X 2 = 1 X 1 = 2) = 1/2 We have updated values: ρ(f 11 d) = beta(f 11 ; 2 + 4, 2 + 1) = beta(f 11 ; 6, 3) ρ(f 21 d) = beta(f 21 ; 1 + 3, 1 + 1) = beta(f 11 ; 4, 2) ρ(f 22 d) = beta(f 11 ; 1 + 0, 1 + 1) = beta(f 11 ; 1, 2)

49 Learning Parameters in a Bayesian Network Learning with Missing Data Items Variances in Computed Relative Frequencies Randomly Missing Data Points Data Items Missing at Random Consider the same network but given the following data points: Case X 1 X ? ? We can estimate the values of X 2 in these cases using P(X 2 X 1 )

50 Learning Parameters in a Bayesian Network Learning with Missing Data Items Variances in Computed Relative Frequencies Data Items Missing at Random Prior Sample Probability Substitution Substituting probabilities for missing data yields the following data points: Case X 1 X 2 Occurences / / / /2 This gives us the updated network beta(f 11; 6, 3) beta(f 21; 7/2, 5/2) beta(f 22; 3/2, 3/2) F 11 X 1 F 21 X 2 F 22 P(X 1 = 1) = 2/3 P(X 2 = 1 X 1 = 1) = 7/12 P(X 2 = 1 X 1 = 2) = 1/2

51 Learning Parameters in a Bayesian Network Learning with Missing Data Items Variances in Computed Relative Frequencies Incorporating Data Set Values Data Items Missing at Random Note that we used our prior sample probabilities to fill in the missing data. However, the data set favors the event X 1 = 1, X 2 = 1 over the event X 1 = 1, X 2 = 2 because the former event occurs twice while the latter occurs only once. This implies that we may get more accurate results by using our updated probabilities in place of our prior sample probabilities

52 Learning Parameters in a Bayesian Network Learning with Missing Data Items Variances in Computed Relative Frequencies Incorporating Data Set Values Data Items Missing at Random Using updated probabilities gives us the data set: Case X 1 X 2 Occurences / / / /2 This gives us a new updated network beta(f 11; 6, 3) beta(f 21; 43/12, 29/12) beta(f 22; 3/2, 3/2) F 11 X 1 F 21 X 2 F 22 P(X 1 = 1) = 2/3 P(X 2 = 1 X 1 = 1) = 43/72 P(X 2 = 1 X 1 = 2) = 1/2

53 Learning Parameters in a Bayesian Network Learning with Missing Data Items Variances in Computed Relative Frequencies Data Items Missing at Random Data Items Missing Not at Random The estimation we have for missing data items is not appropriate if the missing item is not independent of state.

54 Learning Parameters in a Bayesian Network Learning with Missing Data Items Variances in Computed Relative Frequencies Outline 4 Learning Parameters in a Bayesian Network 5 Learning with Missing Data Items 6 Variances in Computed Relative Frequencies

Parameter Learning: Binary Variables

Parameter Learning: Binary Variables Parameter Learning: Binary Variables SS 008 Bayesian Networks Multimedia Computing, Universität Augsburg Rainer.Lienhart@informatik.uni-augsburg.de www.multimedia-computing.{de,org} Reference Richard E.

More information

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation Machine Learning CMPT 726 Simon Fraser University Binomial Parameter Estimation Outline Maximum Likelihood Estimation Smoothed Frequencies, Laplace Correction. Bayesian Approach. Conjugate Prior. Uniform

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Logistics CSE 446: Point Estimation Winter 2012 PS2 out shortly Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Last Time Random variables, distributions Marginal, joint & conditional

More information

Discrete Binary Distributions

Discrete Binary Distributions Discrete Binary Distributions Carl Edward Rasmussen November th, 26 Carl Edward Rasmussen Discrete Binary Distributions November th, 26 / 5 Key concepts Bernoulli: probabilities over binary variables Binomial:

More information

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014 Probability Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh August 2014 (All of the slides in this course have been adapted from previous versions

More information

Overview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58

Overview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58 Overview of Course So far, we have studied The concept of Bayesian network Independence and Separation in Bayesian networks Inference in Bayesian networks The rest of the course: Data analysis using Bayesian

More information

Bayesian Inference. p(y)

Bayesian Inference. p(y) Bayesian Inference There are different ways to interpret a probability statement in a real world setting. Frequentist interpretations of probability apply to situations that can be repeated many times,

More information

Point Estimation. Vibhav Gogate The University of Texas at Dallas

Point Estimation. Vibhav Gogate The University of Texas at Dallas Point Estimation Vibhav Gogate The University of Texas at Dallas Some slides courtesy of Carlos Guestrin, Chris Bishop, Dan Weld and Luke Zettlemoyer. Basics: Expectation and Variance Binary Variables

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 143 Part IV

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Generative Models Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

LEARNING WITH BAYESIAN NETWORKS

LEARNING WITH BAYESIAN NETWORKS LEARNING WITH BAYESIAN NETWORKS Author: David Heckerman Presented by: Dilan Kiley Adapted from slides by: Yan Zhang - 2006, Jeremy Gould 2013, Chip Galusha -2014 Jeremy Gould 2013Chip Galus May 6th, 2016

More information

Probability Theory for Machine Learning. Chris Cremer September 2015

Probability Theory for Machine Learning. Chris Cremer September 2015 Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares

More information

Probability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008

Probability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008 Probability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008 1 Review We saw some basic metrics that helped us characterize

More information

Bayesian Inference: Posterior Intervals

Bayesian Inference: Posterior Intervals Bayesian Inference: Posterior Intervals Simple values like the posterior mean E[θ X] and posterior variance var[θ X] can be useful in learning about θ. Quantiles of π(θ X) (especially the posterior median)

More information

Lecture 2: Conjugate priors

Lecture 2: Conjugate priors (Spring ʼ) Lecture : Conjugate priors Julia Hockenmaier juliahmr@illinois.edu Siebel Center http://www.cs.uiuc.edu/class/sp/cs98jhm The binomial distribution If p is the probability of heads, the probability

More information

Lecture 1: Probability Fundamentals

Lecture 1: Probability Fundamentals Lecture 1: Probability Fundamentals IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge January 22nd, 2008 Rasmussen (CUED) Lecture 1: Probability

More information

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio Estimation of reliability parameters from Experimental data (Parte 2) This lecture Life test (t 1,t 2,...,t n ) Estimate θ of f T t θ For example: λ of f T (t)= λe - λt Classical approach (frequentist

More information

(1) Introduction to Bayesian statistics

(1) Introduction to Bayesian statistics Spring, 2018 A motivating example Student 1 will write down a number and then flip a coin If the flip is heads, they will honestly tell student 2 if the number is even or odd If the flip is tails, they

More information

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007 Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.

More information

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception PROBABILITY DISTRIBUTIONS Credits 2 These slides were sourced and/or modified from: Christopher Bishop, Microsoft UK Parametric Distributions 3 Basic building blocks: Need to determine given Representation:

More information

CS 361: Probability & Statistics

CS 361: Probability & Statistics March 14, 2018 CS 361: Probability & Statistics Inference The prior From Bayes rule, we know that we can express our function of interest as Likelihood Prior Posterior The right hand side contains the

More information

HPD Intervals / Regions

HPD Intervals / Regions HPD Intervals / Regions The HPD region will be an interval when the posterior is unimodal. If the posterior is multimodal, the HPD region might be a discontiguous set. Picture: The set {θ : θ (1.5, 3.9)

More information

Lecture 2: Repetition of probability theory and statistics

Lecture 2: Repetition of probability theory and statistics Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:

More information

Chapter Learning Objectives. Random Experiments Dfiii Definition: Dfiii Definition:

Chapter Learning Objectives. Random Experiments Dfiii Definition: Dfiii Definition: Chapter 2: Probability 2-1 Sample Spaces & Events 2-1.1 Random Experiments 2-1.2 Sample Spaces 2-1.3 Events 2-1 1.4 Counting Techniques 2-2 Interpretations & Axioms of Probability 2-3 Addition Rules 2-4

More information

A primer on Bayesian statistics, with an application to mortality rate estimation

A primer on Bayesian statistics, with an application to mortality rate estimation A primer on Bayesian statistics, with an application to mortality rate estimation Peter off University of Washington Outline Subjective probability Practical aspects Application to mortality rate estimation

More information

Machine Learning using Bayesian Approaches

Machine Learning using Bayesian Approaches Machine Learning using Bayesian Approaches Sargur N. Srihari University at Buffalo, State University of New York 1 Outline 1. Progress in ML and PR 2. Fully Bayesian Approach 1. Probability theory Bayes

More information

Bayesian Models in Machine Learning

Bayesian Models in Machine Learning Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of

More information

Compute f(x θ)f(θ) dθ

Compute f(x θ)f(θ) dθ Bayesian Updating: Continuous Priors 18.05 Spring 2014 b a Compute f(x θ)f(θ) dθ January 1, 2017 1 /26 Beta distribution Beta(a, b) has density (a + b 1)! f (θ) = θ a 1 (1 θ) b 1 (a 1)!(b 1)! http://mathlets.org/mathlets/beta-distribution/

More information

5. Conditional Distributions

5. Conditional Distributions 1 of 12 7/16/2009 5:36 AM Virtual Laboratories > 3. Distributions > 1 2 3 4 5 6 7 8 5. Conditional Distributions Basic Theory As usual, we start with a random experiment with probability measure P on an

More information

CSE 103 Homework 8: Solutions November 30, var(x) = np(1 p) = P r( X ) 0.95 P r( X ) 0.

CSE 103 Homework 8: Solutions November 30, var(x) = np(1 p) = P r( X ) 0.95 P r( X ) 0. () () a. X is a binomial distribution with n = 000, p = /6 b. The expected value, variance, and standard deviation of X is: E(X) = np = 000 = 000 6 var(x) = np( p) = 000 5 6 666 stdev(x) = np( p) = 000

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

Today. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use?

Today. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use? Today Statistical Learning Parameter Estimation: Maximum Likelihood (ML) Maximum A Posteriori (MAP) Bayesian Continuous case Learning Parameters for a Bayesian Network Naive Bayes Maximum Likelihood estimates

More information

Lecture 16. Lectures 1-15 Review

Lecture 16. Lectures 1-15 Review 18.440: Lecture 16 Lectures 1-15 Review Scott Sheffield MIT 1 Outline Counting tricks and basic principles of probability Discrete random variables 2 Outline Counting tricks and basic principles of probability

More information

Computational Perception. Bayesian Inference

Computational Perception. Bayesian Inference Computational Perception 15-485/785 January 24, 2008 Bayesian Inference The process of probabilistic inference 1. define model of problem 2. derive posterior distributions and estimators 3. estimate parameters

More information

ELEG 3143 Probability & Stochastic Process Ch. 2 Discrete Random Variables

ELEG 3143 Probability & Stochastic Process Ch. 2 Discrete Random Variables Department of Electrical Engineering University of Arkansas ELEG 3143 Probability & Stochastic Process Ch. 2 Discrete Random Variables Dr. Jingxian Wu wuj@uark.edu OUTLINE 2 Random Variable Discrete Random

More information

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Probability and Statistics. Terms and concepts

Probability and Statistics. Terms and concepts Probability and Statistics Joyeeta Dutta Moscato June 30, 2014 Terms and concepts Sample vs population Central tendency: Mean, median, mode Variance, standard deviation Normal distribution Cumulative distribution

More information

Some Basic Concepts of Probability and Information Theory: Pt. 2

Some Basic Concepts of Probability and Information Theory: Pt. 2 Some Basic Concepts of Probability and Information Theory: Pt. 2 PHYS 476Q - Southern Illinois University January 22, 2018 PHYS 476Q - Southern Illinois University Some Basic Concepts of Probability and

More information

With Question/Answer Animations. Chapter 7

With Question/Answer Animations. Chapter 7 With Question/Answer Animations Chapter 7 Chapter Summary Introduction to Discrete Probability Probability Theory Bayes Theorem Section 7.1 Section Summary Finite Probability Probabilities of Complements

More information

Bayesian Learning. Instructor: Jesse Davis

Bayesian Learning. Instructor: Jesse Davis Bayesian Learning Instructor: Jesse Davis 1 Announcements Homework 1 is due today Homework 2 is out Slides for this lecture are online We ll review some of homework 1 next class Techniques for efficient

More information

CS 361: Probability & Statistics

CS 361: Probability & Statistics October 17, 2017 CS 361: Probability & Statistics Inference Maximum likelihood: drawbacks A couple of things might trip up max likelihood estimation: 1) Finding the maximum of some functions can be quite

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Advanced Herd Management Probabilities and distributions

Advanced Herd Management Probabilities and distributions Advanced Herd Management Probabilities and distributions Anders Ringgaard Kristensen Slide 1 Outline Probabilities Conditional probabilities Bayes theorem Distributions Discrete Continuous Distribution

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

A Tutorial on Learning with Bayesian Networks

A Tutorial on Learning with Bayesian Networks A utorial on Learning with Bayesian Networks David Heckerman Presented by: Krishna V Chengavalli April 21 2003 Outline Introduction Different Approaches Bayesian Networks Learning Probabilities and Structure

More information

1 Maximum Likelihood Estimation

1 Maximum Likelihood Estimation heads tails Figure 1: A simple thumbtack tossing experiment. L(θ :D) 0 0.2 0.4 0.6 0.8 1 Figure 2: The likelihood function for the sequence of tosses H,T,T,H,H. 1 Maximum Likelihood Estimation In this

More information

A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I. kevin small & byron wallace

A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I. kevin small & byron wallace A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I kevin small & byron wallace today a review of probability random variables, maximum likelihood, etc. crucial for clinical

More information

Probability Theory and Simulation Methods

Probability Theory and Simulation Methods Feb 28th, 2018 Lecture 10: Random variables Countdown to midterm (March 21st): 28 days Week 1 Chapter 1: Axioms of probability Week 2 Chapter 3: Conditional probability and independence Week 4 Chapters

More information

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008 Readings: K&F: 16.3, 16.4, 17.3 Bayesian Param. Learning Bayesian Structure Learning Graphical Models 10708 Carlos Guestrin Carnegie Mellon University October 6 th, 2008 10-708 Carlos Guestrin 2006-2008

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

CS 630 Basic Probability and Information Theory. Tim Campbell

CS 630 Basic Probability and Information Theory. Tim Campbell CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)

More information

Lecture 17: The Exponential and Some Related Distributions

Lecture 17: The Exponential and Some Related Distributions Lecture 7: The Exponential and Some Related Distributions. Definition Definition: A continuous random variable X is said to have the exponential distribution with parameter if the density of X is e x if

More information

Chapter 2: The Random Variable

Chapter 2: The Random Variable Chapter : The Random Variable The outcome of a random eperiment need not be a number, for eample tossing a coin or selecting a color ball from a bo. However we are usually interested not in the outcome

More information

Error analysis for efficiency

Error analysis for efficiency Glen Cowan RHUL Physics 28 July, 2008 Error analysis for efficiency To estimate a selection efficiency using Monte Carlo one typically takes the number of events selected m divided by the number generated

More information

CSC Discrete Math I, Spring Discrete Probability

CSC Discrete Math I, Spring Discrete Probability CSC 125 - Discrete Math I, Spring 2017 Discrete Probability Probability of an Event Pierre-Simon Laplace s classical theory of probability: Definition of terms: An experiment is a procedure that yields

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Probability and Inference

Probability and Inference Deniz Yuret ECOE 554 Lecture 3 Outline 1 Probabilities and ensembles 2 3 Ensemble An ensemble X is a triple (x, A X, P X ), where the outcome x is the value of a random variable, which takes on one of

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

6.041/6.431 Fall 2010 Quiz 2 Solutions

6.041/6.431 Fall 2010 Quiz 2 Solutions 6.04/6.43: Probabilistic Systems Analysis (Fall 200) 6.04/6.43 Fall 200 Quiz 2 Solutions Problem. (80 points) In this problem: (i) X is a (continuous) uniform random variable on [0, 4]. (ii) Y is an exponential

More information

Theorem 1.7 [Bayes' Law]: Assume that,,, are mutually disjoint events in the sample space s.t.. Then Pr( )

Theorem 1.7 [Bayes' Law]: Assume that,,, are mutually disjoint events in the sample space s.t.. Then Pr( ) Theorem 1.7 [Bayes' Law]: Assume that,,, are mutually disjoint events in the sample space s.t.. Then Pr Pr = Pr Pr Pr() Pr Pr. We are given three coins and are told that two of the coins are fair and the

More information

Outline. 1. Define likelihood 2. Interpretations of likelihoods 3. Likelihood plots 4. Maximum likelihood 5. Likelihood ratio benchmarks

Outline. 1. Define likelihood 2. Interpretations of likelihoods 3. Likelihood plots 4. Maximum likelihood 5. Likelihood ratio benchmarks Outline 1. Define likelihood 2. Interpretations of likelihoods 3. Likelihood plots 4. Maximum likelihood 5. Likelihood ratio benchmarks Likelihood A common and fruitful approach to statistics is to assume

More information

Discrete Probability Distributions

Discrete Probability Distributions Discrete Probability Distributions Data Science: Jordan Boyd-Graber University of Maryland JANUARY 18, 2018 Data Science: Jordan Boyd-Graber UMD Discrete Probability Distributions 1 / 1 Refresher: Random

More information

The Central Limit Theorem

The Central Limit Theorem The Central Limit Theorem Patrick Breheny March 1 Patrick Breheny STA 580: Biostatistics I 1/23 Kerrich s experiment A South African mathematician named John Kerrich was visiting Copenhagen in 1940 when

More information

MATH MW Elementary Probability Course Notes Part I: Models and Counting

MATH MW Elementary Probability Course Notes Part I: Models and Counting MATH 2030 3.00MW Elementary Probability Course Notes Part I: Models and Counting Tom Salisbury salt@yorku.ca York University Winter 2010 Introduction [Jan 5] Probability: the mathematics used for Statistics

More information

Chapter 1. Sets and probability. 1.3 Probability space

Chapter 1. Sets and probability. 1.3 Probability space Random processes - Chapter 1. Sets and probability 1 Random processes Chapter 1. Sets and probability 1.3 Probability space 1.3 Probability space Random processes - Chapter 1. Sets and probability 2 Probability

More information

Guidelines for Solving Probability Problems

Guidelines for Solving Probability Problems Guidelines for Solving Probability Problems CS 1538: Introduction to Simulation 1 Steps for Problem Solving Suggested steps for approaching a problem: 1. Identify the distribution What distribution does

More information

Conjugate Priors, Uninformative Priors

Conjugate Priors, Uninformative Priors Conjugate Priors, Uninformative Priors Nasim Zolaktaf UBC Machine Learning Reading Group January 2016 Outline Exponential Families Conjugacy Conjugate priors Mixture of conjugate prior Uninformative priors

More information

Bayesian Estimation An Informal Introduction

Bayesian Estimation An Informal Introduction Mary Parker, Bayesian Estimation An Informal Introduction page 1 of 8 Bayesian Estimation An Informal Introduction Example: I take a coin out of my pocket and I want to estimate the probability of heads

More information

Chapter 3 Single Random Variables and Probability Distributions (Part 1)

Chapter 3 Single Random Variables and Probability Distributions (Part 1) Chapter 3 Single Random Variables and Probability Distributions (Part 1) Contents What is a Random Variable? Probability Distribution Functions Cumulative Distribution Function Probability Density Function

More information

Conditional Probability & Independence. Conditional Probabilities

Conditional Probability & Independence. Conditional Probabilities Conditional Probability & Independence Conditional Probabilities Question: How should we modify P(E) if we learn that event F has occurred? Definition: the conditional probability of E given F is P(E F

More information

Probability theory basics

Probability theory basics Probability theory basics Michael Franke Basics of probability theory: axiomatic definition, interpretation, joint distributions, marginalization, conditional probability & Bayes rule. Random variables:

More information

Algorithms for Uncertainty Quantification

Algorithms for Uncertainty Quantification Algorithms for Uncertainty Quantification Tobias Neckel, Ionuț-Gabriel Farcaș Lehrstuhl Informatik V Summer Semester 2017 Lecture 2: Repetition of probability theory and statistics Example: coin flip Example

More information

Probability Distributions

Probability Distributions Probability Distributions Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr 1 / 25 Outline Summarize the main properties of some of

More information

Lecture 18: Learning probabilistic models

Lecture 18: Learning probabilistic models Lecture 8: Learning probabilistic models Roger Grosse Overview In the first half of the course, we introduced backpropagation, a technique we used to train neural nets to minimize a variety of cost functions.

More information

An Introduction to Bayesian Machine Learning

An Introduction to Bayesian Machine Learning 1 An Introduction to Bayesian Machine Learning José Miguel Hernández-Lobato Department of Engineering, Cambridge University April 8, 2013 2 What is Machine Learning? The design of computational systems

More information

Chapter 3 : Conditional Probability and Independence

Chapter 3 : Conditional Probability and Independence STAT/MATH 394 A - PROBABILITY I UW Autumn Quarter 2016 Néhémy Lim Chapter 3 : Conditional Probability and Independence 1 Conditional Probabilities How should we modify the probability of an event when

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Parameter Estimation December 14, 2015 Overview 1 Motivation 2 3 4 What did we have so far? 1 Representations: how do we model the problem? (directed/undirected). 2 Inference: given a model and partially

More information

M378K In-Class Assignment #1

M378K In-Class Assignment #1 The following problems are a review of M6K. M7K In-Class Assignment # Problem.. Complete the definition of mutual exclusivity of events below: Events A, B Ω are said to be mutually exclusive if A B =.

More information

Statistics for Managers Using Microsoft Excel/SPSS Chapter 4 Basic Probability And Discrete Probability Distributions

Statistics for Managers Using Microsoft Excel/SPSS Chapter 4 Basic Probability And Discrete Probability Distributions Statistics for Managers Using Microsoft Excel/SPSS Chapter 4 Basic Probability And Discrete Probability Distributions 1999 Prentice-Hall, Inc. Chap. 4-1 Chapter Topics Basic Probability Concepts: Sample

More information

CS 361: Probability & Statistics

CS 361: Probability & Statistics February 12, 2018 CS 361: Probability & Statistics Random Variables Monty hall problem Recall the setup, there are 3 doors, behind two of them are indistinguishable goats, behind one is a car. You pick

More information

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network. ecall from last time Lecture 3: onditional independence and graph structure onditional independencies implied by a belief network Independence maps (I-maps) Factorization theorem The Bayes ball algorithm

More information

The Random Variable for Probabilities Chris Piech CS109, Stanford University

The Random Variable for Probabilities Chris Piech CS109, Stanford University The Random Variable for Probabilities Chris Piech CS109, Stanford University Assignment Grades 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 Frequency Frequency 10 20 30 40 50 60 70 80

More information

Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution

Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution James V. Lambers Department of Mathematics The University of Southern Mississippi James V. Lambers Statistical Data Analysis

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing Robert Vanderbei Fall 2014 Slides last edited on November 24, 2014 http://www.princeton.edu/ rvdb Coin Tossing Example Consider two coins.

More information

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018 CLASS NOTES Models, Algorithms and Data: Introduction to computing 208 Petros Koumoutsakos, Jens Honore Walther (Last update: June, 208) IMPORTANT DISCLAIMERS. REFERENCES: Much of the material (ideas,

More information

Bayesian Inference for Binomial Proportion

Bayesian Inference for Binomial Proportion 8 Bayesian Inference for Binomial Proportion Frequently there is a large population where π, a proportion of the population, has some attribute. For instance, the population could be registered voters

More information

Statistical Methods in Particle Physics

Statistical Methods in Particle Physics Statistical Methods in Particle Physics Lecture 3 October 29, 2012 Silvia Masciocchi, GSI Darmstadt s.masciocchi@gsi.de Winter Semester 2012 / 13 Outline Reminder: Probability density function Cumulative

More information

A Brief Review of Probability, Bayesian Statistics, and Information Theory

A Brief Review of Probability, Bayesian Statistics, and Information Theory A Brief Review of Probability, Bayesian Statistics, and Information Theory Brendan Frey Electrical and Computer Engineering University of Toronto frey@psi.toronto.edu http://www.psi.toronto.edu A system

More information

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted

More information

Learning P-maps Param. Learning

Learning P-maps Param. Learning Readings: K&F: 3.3, 3.4, 16.1, 16.2, 16.3, 16.4 Learning P-maps Param. Learning Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 24 th, 2008 10-708 Carlos Guestrin 2006-2008

More information

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com 1 School of Oriental and African Studies September 2015 Department of Economics Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com Gujarati D. Basic Econometrics, Appendix

More information

02 Background Minimum background on probability. Random process

02 Background Minimum background on probability. Random process 0 Background 0.03 Minimum background on probability Random processes Probability Conditional probability Bayes theorem Random variables Sampling and estimation Variance, covariance and correlation Probability

More information

Data Mining Techniques. Lecture 3: Probability

Data Mining Techniques. Lecture 3: Probability Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 3: Probability Jan-Willem van de Meent (credit: Zhao, CS 229, Bishop) Project Vote 1. Freeform: Develop your own project proposals 30% of

More information

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4 ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian

More information

Bayesian Inference. STA 121: Regression Analysis Artin Armagan

Bayesian Inference. STA 121: Regression Analysis Artin Armagan Bayesian Inference STA 121: Regression Analysis Artin Armagan Bayes Rule...s! Reverend Thomas Bayes Posterior Prior p(θ y) = p(y θ)p(θ)/p(y) Likelihood - Sampling Distribution Normalizing Constant: p(y

More information

UCSD CSE 21, Spring 2014 [Section B00] Mathematics for Algorithm and System Analysis

UCSD CSE 21, Spring 2014 [Section B00] Mathematics for Algorithm and System Analysis UCSD CSE 21, Spring 2014 [Section B00] Mathematics for Algorithm and System Analysis Lecture 10 Class URL: http://vlsicad.ucsd.edu/courses/cse21-s14/ Lecture 10 Notes Midterm Good job overall! = 81; =

More information