10601 Machine Learning

Size: px

Start display at page:

Download "10601 Machine Learning"

Poppy George
5 years ago
Views:

1 10601 Machine Learning September 2, 2009 Recitation 2 Öznur Taştan 1

2 Logistics Homework 2 is going to be out tomorrow. It is due on Sep 16, Wed. There is no class on Monday Sep 7 th (Labor day) Those who have not return Homework 1 yet For details of how to submit the homework policy please check :

3 Outline We will review Some probability and statistics Some graphical models We will not go over Homework 1 Since the grace period has not ended yet. Solutions will be up next week on the web page.

4 We ll play a game: Catch the goof! I ll be the sloppy TA will make intentional mistakes Slides with mistakes are marked with You ll catch those mistakes and correct me! Correct slides are marked with

5 Catch the goof!!

6 Law of total probability Given two discrete random variables X and Y X takes values in { x x } 1,, m y1,, yn Y takes values in{ } PY ( = y) = P( X= x, Y= y) j i j j PY ( = y) = P( X= x Y= y) PY ( = y) j i j j j

7 Law of total probability Given two discrete random variables X and Y X takes values in { x x } 1,, m y1,, yn Y takes values in{ } PY ( = y) = P( X= x, Y= y) j i j j PY ( = y) = P( X= x Y= y) PY ( = y) j i j j j

8 Law of total probability Given two discrete random variables X and Y X takes values in { x x } 1,, m y1,, yn Y takes values in{ } P( X = x i ) PY ( = y) = P( X= x, Y= y) j i j j PY ( = y) = P( X= x Y= y) PY ( = y) P( X = x i ) j i j j j

9 Law of total probability Given two discrete random variables X and Y X takes values in { x x } 1,, m y1,, yn Y takes values in{ } PX ( = x) = PX ( = xy, = y) i i j j PX ( = x) = PX ( = x Y= y) PY ( = y) i i j j j

10 Law of total probability Given two discrete random variables X and Y PX ( = x) = PX ( = xy, = y) Marginal probability i i j j Joint probability = PX ( = xi Y= yj) PY ( = yj) j Conditional probability of X conditioned on Y

11 Law of total probability Given two discrete random variables X and Y Formulas are fine. Anything wrong with the names? PX ( = x) = PX ( = xy, = y) Marginal probability i i j j Joint probability = PX ( = xi Y= yj) PY ( = yj) j Conditional probability of X conditioned on Y

12 Law of total probability Given two discrete random variables X and Y PX ( = x) = PX ( = xy, = y) i i j j Joint probability of X,Y Marginal probability = PX ( = xi Y= yj) PY ( = yj) j Conditional probability of X conditioned on Y Marginal probability

13 In a strange world Two discrete random variables X and Y take binary values Joint probabilities P( X = 0, Y = 1) = 0.2 P( X = 0, Y = 0) = 0.2 P( X = 1, Y = 0) = 0.5 P( X = 1, Y = 1) = 0.5

14 In a strange world Two discrete random variables X and Y take binary values Joint probabilities P( X = 0, Y = 1) = 0.2 Should sum up to 1 P( X = 0, Y = 0) = 0.2 P( X = 1, Y = 0) = 0.5 P( X = 1, Y = 1) = 0.5

15 The world seems fine Two discrete random variables X and Y take binary values Joint probabilities P( X = 0, Y = 1) = 0.2 P( X = 0, Y = 0) = 0.2 P( X = 1, Y = 0) = 0.3 P( X = 1, Y = 1) = 0.3

16 What about the marginals? Joint probabilities P( X = 0, Y = 1) = 0.2 P( X = 0, Y = 0) = 0.2 P( X = 1, Y = 0) = 0.3 P( X = 1, Y = 1) = 0.3 Marginal probabilities P( X = 0) = 0.2 PX ( = 1) = 0.8 P( Y = 0) = 0.5 PY ( = 1) = 0.5

17 This is a strange world Joint probabilities P( X = 0, Y = 1) = 0.2 P( X = 0, Y = 0) = 0.2 P( X = 1, Y = 0) = 0.3 P( X = 1, Y = 1) = 0.3 Marginal probabilities P( X = 0) = 0.2 PX ( = 1) = 0.8 P( Y = 0) = 0.5 PY ( = 1) = 0.5

18 In a strange world Joint probabilities P( X = 0, Y = 1) = 0.2 P( X = 0, Y = 0) = 0.2 P( X = 1, Y = 0) = 0.3 P( X = 1, Y = 1) = 0.3 Marginal probabilities P( X = 0) = 0.2 PX ( = 1) = 0.8 P( Y = 0) = 0.5 PY ( = 1) = 0.5

19 This is a strange world Joint probabilities P( X = 0, Y = 1) = 0.2 P( X = 0, Y = 0) = 0.2 P( X = 1, Y = 0) = 0.3 P( X = 1, Y = 1) = 0.3 Marginal probabilities P( X = 0) = 0.2 PX ( = 1) = 0.8 P( Y = 0) = 0.5 PY ( = 1) = 0.5

20 Let s have a simple problem Joint probabilities P( X = 0, Y = 1) = 0.2 P( X = 0, Y = 0) = 0.2 P( X = 1, Y = 0) = 0.3 P( X = 1, Y = 1) = 0.3 Marginal probabilities P( X = 0) = 0.4 PX ( = 1) = 0.6 P( Y = 0) = 0.5 PY ( = 1) = 0.5 P( X = 0) = P( X = 0, Y = 0) + P( X = 1, Y = 1) =0.4

21 Conditional probabilities What is the complementary event of P(X=0 Y=1)? P(X=1 Y=1) OR P(X=0 Y=0)

22 Conditional probabilities What is the complementary event of P(X=0 Y=1)? P(X=1 Y=1) OR P(X=0 Y=0)

23 The game ends here.

24 Independent number of parameters Assume X and Y take Boolean values {0,1}: How many independent parameters do you need to fully specify: marginal probability of X? the joint probability of P(X,Y)? the conditional probability of P(X Y)?

25 Independent number of parameters Assume X and Y take Boolean values {0,1}: How many independent parameters do you need to fully specify: marginal probability of X? P(X=0) 1 parameter only [ because P(X=1)+P(X=0)=1 ] the joint probability of P(X,Y)? P(X=0, Y=0) 3 parameters P(X=0, Y=1) P(X=1, Y=0) the conditional probability of P(X Y)?

26 Number of parameters Assume X and Y take Boolean values {0,1}? How many independent parameters do you need to fully specify marginal probability of X? P(X=0) 1 parameter only P(X=1)= 1 P(X=0) How many independent parameters do you need to fully specify the joint probability of P(X,Y)? P(X=0, Y=0) 3 parameters P(X=0, Y=1) P(X=1, Y=0) How many independent parameters do you need to fully specify the conditional probability of P(X Y)? P(X=0 Y=0) 2 parameters P(X=0 Y=1)

27 Number of parameters What about P(X Y,Z), how many independent parameters do you need to be able to fully specify the probabilities? Assume each RV takes: P(X Y,Z) m values n values q values

28 Number of parameters What about P(X Y,Z), how many independent parameters do you need to be able to fully specify the probabilities? Assume each RV takes: P(X Y,Z) m values n values q values Number of independent parameters: (m 1)*nq

29 Graphical models A graphical model is a way of representing probabilistic relationships between random variables Variables are represented by nodes: Edges indicates probabilistic relationships: P(X,X..X ) = P(X Pa(X )) 1 2 n i i i= 1 You miss the bus Arrive class late

30 Serial connection Is X and Z independent? X X Z? Y Z

31 Serial connection Is X and Z independent? X X Z Y X and Z are not independent Z

32 Serial connection Is X conditionally independent of Z given Y? X X Z Y? Y Z

33 Serial connection Is X conditionally independent of Z given Y? X X Z Y Y Yes they are independent Z

34 How can we show it? Is X conditionally independent of Z given Y? X P( XYZ,, ) = PXPY ( ) ( XPZ ) ( Y) Y Z P( Z X, Y) PXYZ (,, ) = PXY (, ) P( XPY ) ( XPZY ) ( ) = PXPY ( ) ( X) = PZ ( Y) X Z Y

35 An example case Studied late last night Wake up late Arrive class late

36 Common cause Age Z X Y Shoe Size Gray Hair X and Y are not marginally independent X and Y are conditionally independent given Z

37 Explaining away Flu X Z Allergy Y Sneeze X and Z marginally independent X and Z conditionally dependent given Y

38 D separation X and Z are conditionally independent given Y if Y d separates X and Z X X X X Y Y Y Y Neither Y nor its descendants should be observed Z Z Z Z Path between X and Z is blocked by Y

39 D separation example Is B, C independent given A?

40 D separation example Is B, C independent given A? Yes

41 D separation example Observed, A blocks the path Is B, C independent given A? Yes

42 Observed, A blocks the path Is B, C independent given A? Yes not observed neither its descendants

43 D separation example Is A, F independent given E?

44 Is A, F independent given E? Yes

45 Is A, F independent given E? Yes

46 Is C, D independent given F?

47 Is C, D independent given F? No

48 Is A, G independent given B and F?

49 Is A, G independent given B and F? Yes

50 Naïve Bayes Model J D C R J: The person is a junior D: The person knows calculus C: The person leaves in campus R: Saw the Return of the King more than once

51 Naïve Bayes Model J What parameters are stored? D C R J: The person is a junior D: The person knows calculus C: The person leaves in campus R: Saw the Return of the King more than once

52 Naïve Bayes Model J P(J)= D C R P(D/J=1)= P(D/J=0)= P(C/J=1)= P(C/J=0)= P(R/J=1)= P(R/J=0)= J : The person is a junior D: The person knows calculus C: The person leaves in campus R: Saw the Return of the King more than once

53 Naïve Bayes Model J P(J)= D C R P(D/J=1)= P(D/J=0)= P(C/J=1)= P(C/J=0)= P(R/J=1)= P(R/J=0)= J: The person is a junior D: The person knows calculus C: The person leaves in campus R: Saw the Return of the King more than once

54 We have the structure how do we get the CPTs? Estimate them from observed data Are you a junior? Do you know calculus? Do you live in campus? Have you seen 'Return of the King more than once? Student Student Student Student Student Student Student Student Student Student Student Student Student Student Student Student Student Student Student Student

55 What is the probability that he is a Junior? Naïve Bayes Model P(J)= J: The person is a junior D: The person knows calculus C: The person leaves in campus R: Saw the Return of the King more than once J D C R P(R/J)= P(R/~J)= P(C/J)= P(C/J)= Suppose a new person come and says: P(C/~J)= I don t know calculus P(C/~J)= I live in campus I have seen The return of the king five times

56 Naïve Bayes Model J Suppose a person says: I don t know calculus D=0 I live in campus C=1 I have not seen The return of the king five times R=1 D C R What is the probability that he is a Junior? P(J=1/D=0,C=1,R=1)

57 What is the probability that he is a Junior? P(J=1/D=0,C=1,R=1) = P(J=1,D=0,C=1,R=1) P(D = 0,C = 1,R = 1) = P(J = 1)P(C= 1/J= 1)P(R = 1/J= 1)P(D= 0/J= 1) P(D = 0,C = 1,R = 1) To calculate this marginalize over J J D C R

58 Naïve Bayes Model P(J=1/D=0,C=1,R=1) P(J = 1)P(C= 1/J= 1)P(R = 1/J= 1)P(D= 0/J= 1) = P(D = 0,C = 1,R = 1)

Introduction to probability and statistics

Introduction to probability and statistics Alireza Fotuhi Siahpirani & Brittany Baur sroy@biostat.wisc.edu Computational Network Biology Biostatistics & Medical Informatics 826 https://compnetbiocourse.discovery.wisc.edu