COMP61011 : Machine Learning. Probabilis*c Models + Bayes Theorem

COMP61011 : Machine Learning Probabilis*c Models + Bayes Theorem

Probabilis*c Models - one of the most active areas of ML research in last 15 years - foundation of numerous new technologies - enables decision-making under uncertainty - Tough. Don t expect to get this immediately. It takes time.

I have four snooker balls in a bag 2 black, 2 white. I reach in with my eyes closed. What is the probability of picking a black ball? I give this variable a name, A. p(a = black) = 1 2

Picking a black ball, then replacing, then picking black again? p(a = black, B = black) = 1 4 Why? p(a = black)p(b = black) = 1 2 1 2 = 1 4

Picking two black balls in sequence (i.e. no replacing)? p(a = black, B = black) =? p(a = black)p(b = black A = black) 1 2 1 3 = 1 6

Probabilities and Conditional Probabilities p(a = black) p(b = black A = black) Events : A, B, C, etc random variables e.g. A is the random event of picking the first ball. B is the random event of picking the second ball. p(a =1) p(b =1 A =1) where 1 means the ball was black.

Rules of Probability Theory p(a =1, B =1) = p(a =1)p(B =1 A =1) Probability that both balls are black = Probability that the first is black x Probability that the second is black, given that the first was black

Shorthand notation p(a =1, B =1) = p(a =1)p(B =1 A =1) p(a, B) = p(a)p(b A) Means that the rule holds for all possible assignments of values to A and B.

If two events A,B are dependent : p(a, B) = p(b A)p(A) e.g. black/white balls example If two events A,B are independent : p(a, B) = p(a)p(b) e.g. two consecutive rolls of a dice

Outlook Temperature Humidity Wind Tennis? D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No p( wind = strong ) = The chances of the wind being strong, among all days.

Outlook Temperature Humidity Wind Tennis? D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No p( wind = strong ) = 6 /14 = 0. 4286 The chances of the wind being strong, among all days.

Outlook Temperature Humidity Wind Tennis? D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No p( wind = strong tennis = yes ) = The chances of a strong wind day, given that the person enjoyed tennis.

Outlook Temperature Humidity Wind Tennis? D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D7 Overcast Cool Normal Strong Yes D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes p( wind = strong tennis = yes ) = The chances of a strong wind day, given that the person enjoyed tennis.

Outlook Temperature Humidity Wind Tennis? D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D7 Overcast Cool Normal Strong Yes D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes p( wind = strong tennis = yes ) = 3 / 9 = 0. 333 The chances of a strong wind day, given that the person enjoyed tennis.

Outlook Temperature Humidity Wind Tennis? D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No p( tennis = yes wind = strong ) = The chances of the person enjoying tennis, given that it is a strong wind day.

Outlook Temperature Humidity Wind Tennis? D2 Sunny Hot High Strong No D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D14 Rain Mild High Strong No p( tennis = yes wind = strong ) = 0.5 The chances of the person enjoying tennis, given that it is a strong wind day.

Outlook Temperature Humidity Wind Tennis? D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No p( temp = hot tennis = yes ) = p( tennis = yes temp = hot) = p( tennis = yes temp = hot, humidity = high) =

What s the use of all this? - We can calculate these numbers on data - Leads to an elegant theorem we can make use of

A problem to solve: 1% of the population get cancer 80% of people with cancer get a positive test 9.6% of people without cancer also get a positive test The question: A person has a test for cancer that comes back positive. What is the probability that they actually have cancer? Quick guess: a) less than 1% b) somewhere between 1% and 70% c) between 70% and 80% d) more than 80%

Write down the probabilities of everything Define variables: C : 1= presence of cancer, 0 = no cancer, E : 1= positive test, 0 = negative test The prior probability of cancer in the population is 1%, so p(c =1) = 0.01 The probability of positive test given there is cancer, p(e =1 C =1) = 0.8 If there is no cancer, we still have p(e =1 C = 0) = 0.096 p(c=1 E =1) The question is: what is?

Working with Concrete Numbers 10,000 patients p(c=1) = 0.01 100 cancer p(c=0) = 0.99 9900 no cancer p(e=1 C=1) = 0.8 p(e=1 C=0) = 0.096 80 cancer, positive test 20 cancer, negative test 950.4 no cancer, positive test 8949.6 no cancer, negative test p(c =1 E =1) =? How many people from 10,000 get E=1? How many from those get C=1?

Surprising result! Do you trust your Doctor? Although the probability of a positive test given cancer is 80%, the probability of cancer given a positive test is only about 7.8%. 8/10 doctors would have said: c) between 70% and 80%. WRONG!! Common mistake: the probability that a person with positive test has cancer is not the same as the probability that a person with cancer has a positive test. One must also consider : the background chances (prior) of having cancer, the chances of receiving a false alarm in the test.

Solving the same problem, via Bayes Theorem p(e =1 C =1) = 0.8 p( C = 1) = 0.01 p(e =1,C =1) = p(e =1 C =1)p(C =1) The general statement is: p(e, C) = p(e C)p(C) And since the statement E and C is equivalent to C and E : p(e, C) = p(c E)p(E)

Solving the same problem, via Bayes Theorem p(e, C) = p(e C)p(C) = p(c E)p(E) Now rearrange p(c E)p(E) = p(e C)p(C) p(c E) = p(e C)p(C) p(e)

p(c E) = p(e C)p(C) p(e) Rev. Thomas Bayes, 1702-1761 Bayes Theorem forms the backbone of the past 20 years of ML research into probabilistic models. Think of E as effect and C as cause. But.. warning: sometimes thinking this way will be very non-intuitive.

we know this we know this we want this p(c =1 E =1) = p(e =1 C =1)p(C =1) p(e =1) we can calculate this Another rule of probability theory: marginalizing p(e =1) = p(e =1C = c)p(c = c) c C Think of this as given all possible things that can happen with C, what is the probability of E=1?

p(c =1 E =1) = = p(e =1 C =1)p(C =1) p(e =1) p(e =1 C =1)p(C =1) p(e =1 C =1)p(C =1)+ p(e =1 C = 0)p(C = 0) Notice the denominator now contains the same term as the numerator. We only need to know two terms here: p(e=1 C=1)p(C=1) and p(e=1 C=0)p(C=0)

p(e =1 C =1)p(C =1) 0.8 0. 01 = 0.008 p(e =1 C = 0)p(C = 0) 0.096 0.99 = 0.09504 p(c =1 E =1) = 0.008 0.008 + 0.09504 = 0.0776 = 7.76%

Bayes theorem. Talk to your neighbours 5 mins or so.

Another Example what year is it? You jump in a time machine. It takes you somewhere. But you don t know to what year it has taken you. You know it is one of 1885, 1955, 1985, or 2015.

What year is it? You look out the window and see a STEAM train. What are the chances of seeing this in the year 2015? Let s guess

What year is it? In other years? And remember

What year is it? Bayes Theorem to the rescue. We can calculate the denominator as

What year is it? Bayes Theorem to the rescue.

What year is it? Bayes Theorem to the rescue. For other years.

What year is it? Then you look out the window. And see someone wearing Nike branded trainers.

What year is it? But now our belief over what year it is has changed, because of the train But, Bayes Theorem can just use this, plugging it back into the same equation

What year is it?

What year is it? Prior belief Observation Observation We believe we are in 1985, with p = 0.945

Bayes theorem, done. Take a 15 minute break.

More Problems Solved with Probabilities Your car is making a noise. What are the chances that the tank is empty? The chances of the car making noise, if the tank really is empty. The chances of the car making noise, if the tank is not empty p( noisy = 1 empty = 1) = p( noisy = 1 empty = 0) = 0.9 0.2 The chances of the tank being empty, regardless of anything else. p( empty = 1) = 0.5 p( empty = 1 noisy = 1) =?

Bayes Theorem p(noisy =1 empty =1)p(empty =1) 0.9 0. 5 p(noisy =1 empty = 0)p(empty = 0) 0.2 0. 5 p(empty =1 noisy =1) = 0.9 * 0.5 (0.9 * 0.5)+ (0.2 * 0.5) = 0.8182

Another Problem to Solve A person tests positive for a certain medical disease. What are the chances that they really do have the disease? The chances of the test being positive, if the person really is ill. The chances of the test being positive, if the person is in fact well. p( test = 1 disease = 1) = p( test = 1 disease = 0) = 0.9 0.01 The chances of the condition, in the general population. p( disease = 1) = 0.05 p( disease = 1 test = 1) =?

Bayes Theorem p(test =1 disease =1)p(disease =1) 0.9 0. 05 p(test =1 disease = 0)p(disease = 0) 0.01 0. 95 0.045 p( disease = 1 test = 1) = = 0.045 + 0.0095 0.8257

Another Problem to Solve Marie is getting married tomorrow, at an outdoor ceremony in the desert. In recent years, it has rained only 5 days each year. Unfortunately, the weatherman has predicted rain for tomorrow. When it actually rains, the weatherman correctly forecasts rain 90% of the time. When it doesn't rain, he incorrectly forecasts rain 20% of the time. What is are the chances it will rain on the day of Marie's wedding? The chances of the forecast saying rain, if it really does rain. The chances of the forecast saying rain, if it will be fine. p( forecastrain = 1 rain = 1) = p( forecastrain = 1 rain = 0) = 0.9 0.2 The chances of rain, in the general case. p( rain = 1) = 5/365 = 0.0137