Discrete Bayes. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Size: px
Start display at page:

Download "Discrete Bayes. Robert M. Haralick. Computer Science, Graduate Center City University of New York"

Transcription

1 Discrete Bayes Robert M. Haralick Computer Science, Graduate Center City University of New York

2 Outline 1 The Basic Perspective 2 3

3 Perspective

4 Unit of Observation Not Equal to Unit of Classification Consider the task of recognizing a truck with a machine gun mounted on its back end in the context of guerilla urban warfare. The sensor is a camera. The unit of observation is the pixel, millions of them. No one pixel of the truck carries sufficient information about the class.

5 Grouping The units of observation have to be grouped into relevant groups. Classification must be based on features extracted from the relevant groups.

6 Cyber Attack Units of Observation The units of observations are transaction records. For example the records might be records of the secure log. The problem is to decide whether or not an IP address should be banned. Aug 16 04:41:06 cunygrid sshd[12535]: Invalid user alexis from Aug 16 04:41:06 cunygrid sshd[12543]: input_userauth_request: invalid user alexis Aug 16 04:41:06 cunygrid sshd[12535]: pam_unix(sshd:auth): check pass; user unknown Aug 16 04:41:06 cunygrid sshd[12535]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=

7 Cyber Attack Units of Observation Each observation is a record A record is a set of attribute value pairs Attributes Date Time Connection ID Login ID IP address Login Id accepted/rejected Password Password accepted/rejected Disconnection Values

8 Grouping Units of Observation Thread by connection id Thread by login ID Thread by IP address Thread by Time interval

9 Extracting Features For a set of records in the given time interval with same IP address: Number of records Number of login ID records Number of login ID s accepted Number of disconnects

10 Extracting Features For a set of records in the given time interval with same IP address and login ID: Number of records Number of passwords attempted Number of passwords accepted Number of disconnects

11 Grouping and Feature Extracting The process of grouping together the raw observation units to the unit of observation in the machine learning process and the process of defining the feature vector for the grouped units is part of the art of Machine Learning.

12 Be Rational, Use Expected Value When faced with a number of actions, each of which could give rise to more than one possible outcome with different probabilities, the rational action is to: Identify all possible outcomes Determine their values, utilities, (positive or negative) Determine the probabilities that will result from each course of action Multiply the two to give an expected value The best action is the one that gives rise to the highest expected value (Blaise Pascal, Pensées, 1670)

13 Highest Expected Value a 1, a 2 Two possible actions e 1, e 2, e 3 Three possible economic gains The Probabilities for three gains under action a 1 P(e 1 a 1 ), P(e 2 a 1 ), P(e 3 a 1 ) The Probabilities for three gains under action a 2 P(e 1 a 2 ), P(e 2 a 2 ), P(e 3 a 2 ) Probabilities P(e a) Expected Value Action e 1 = 1 e 2 =3 e 3 =0 a = 1.75 a = 1.1 Choose Action a 1

14 Bayes Theorem c Class d Observed data P(d c) The class conditional probabilities Current state of knowledge P(c). New evidence, the observed data d. What is the updated probability of c: P(c d)? P(c d) = P(c,d) P(d) P(d c) = P(c,d) P(c) P(c d) = P(d c)p(c) P(d) Thomas Bayes ( ), first showed how to use new evidence to update beliefs.

15 Units Observed In what follows, the unit being observed is assumed to be that which arises from the grouping process. The measurement is assumed to be the feature vector.

16 Basic Process Outline In the pattern discrimination or pattern identification process, a unit is observed or measured and a category assignment is made on the basis of the measurement. This event can be characterized by its three parts: unit has true category identification c j, a member of the set C decision rule assignment is to category c k, a member of the set C measurement d is made from the set D

17 unit has true category identification c j, a member of the set C decision rule assignment is to category c k, a member of the set C measurement d is made from the set D Denote this event by (c j, c k, d) It has a probability of occurring: P(True = c j, Assigned = c k, Meas = d)

18 Economic Consequences The act of making the assignment carries consequences, economically or in terms of utility. Assumption These consequences depend only on which category is the true category identification for the unit and which category is the assigned category identification for the unit. They do not depend on which particular unit is being assigned or on what measurement values the unit to be assigned has.

19 Economic Consequences: Gain Matrix ASSIGNED c 1 c 2 c k c K c 1. T R U c j e(c j, c k ) E... c K The Economic Gain Matrix is Application Dependent

20 Confusion Matrix ASSIGNED c 1 c 2 c k c K c 1. T R U c j P TA (c j, c k ) E... c K The Confusion Matrix Tells How Correct Is the Machine Learning

21 Probability of Correct Identification Definition The probability of correct identification, P c, is the probability that the True Identification equals the Assigned Identification. It is defined by K P c = P TA (c j, c j ) j=1 the sum of the diagonal entries of the confusion matrix.

22 Confusion Probabilities What is the probability of observing a unit whose true class identification is c j and whose assigned class identification is c k? P TA (c j, c k ) = P TA (c j, c k, d) d D

23 Expected Economic Gain For each true-assigned category identification pair (c j, c k ) we know its economic gain e(c j, c k ) and probability P TA (c j, c k )

24 Expected Economic Gain How do we compute the average or expected value of the consequence? E[e] = = K K e(c j, c k )P TA (c j, c k ) j=1 k=1 K K e(c j, c k ) P TA (c j, c k, d) j=1 k=1 d D

25 Expected Gain ASSIGNED c 1 c 2 c K T c 1 R c 2 U E P TA (c j, c k ) ASSIGNED c 1 c 2 c K T c 1 R c 2 U E e(c j, c k ) c K Confusion Matrix c K Economic Gain Matrix K K E[e] = e(c j, c k )P TA (c j, c k ) j=1 k=1

26 Analysis What are the processes which lead to the event (c j, c k, d) occurring? 1 There is a unit having measurement d whose true class identification is c j. 2 Using only the measurement d, the decision rule makes the assignment to class c k.

27 Fair Game Assumption Nature and the Decision Rule are not in collusion In making the class assignment, the decision rule must only use the measurement data. The decision rule cannot use the true class identification. nature only uses d in deciding true class identification c j decision rule only uses d in deciding the assigned class identification c k

28 Fair Game Assumption c k assigned class c j true class d measurement data P AT (c k (c j, d)) = P A (c k d) Therefore, P AT (c k, c j d) = P AT (c k, c j, d) P(d) = P AT (c k c j, d)p T (c j, d) P(d) = P A (c k d)p T (c j d)

29 Fair Game Assumption Nature and the decision rule are not in collusion. Given that a unit has measurements, d, the assigned class which the decision rule determines and the true class which nature determines are statistically independent. P AT (c k, c j d) = P A (c k d)p T (c j d)

30 The Probabilistic Decision Rule Let f d (c k ) be the conditional probability that the decision rule assigns a unit to class c k given that the unit has measurement d. We call f the decision rule.

31 The Probabilistic Decision Rule Since, conditioned on measurement d, the true and assigned class are statistically independent, P TA (c j, c k d) = P T (c j d)f d (c k )

32 The Probabilistic Decision Rule By definition of conditional probability, P TA (c j, c k d) = P TA(c j, c k, d) P(d) P TA (c j, c k, d) = P TA (c j, c k d)p(d)

33 The Probabilistic Decision Rule P TA (c j, c k, d) = P TA (c j, c k d)p(d) = P T (c j d)f d (c k )P(d) = P T (c j, d)f d (c k )

34 Expected Gain The Expected Gain is then E[e; f ] = = = K K j=1 k=1 K K j=1 k=1 K d D k=1 e(c j, c k ) P TA (c j, c k, d) d D e(c j, c k ) P T (c j, d)f d (c k ) d D K f d (c k ) e(c j, c k ) P T (c j, d) j=1

35 Maximizing a 1,..., a L is a sequence of L numbers p 1,..., p L is a sequence of unknown numbers satisfying p i 0 L p i = 1 i=1 L Find an upper bound for a i p i over all possible such i=1 sequences p 1,..., p L satisfying the constraints.

36 Maximizing Since p i 0, a i max a j for i = 1,..., L j j=1,...,l a i p i max j j=1,...,l a j p i

37 Maximizing Since inequality holds for each i, it must also hold for the sum. L i=1 a i p i L max j i=1 max j j=1,...,l j=1,...,l a j a j L i=1 p i p i max j j=1,...,l a j

38 Maximizing If p 1,..., p L is a sequence of L numbers satisfying p i 0 and L p i = 1 i=1 then max a j j j=1,...l L a i p i i=1 Thus, max a j j j=1,...,l is an upper bound for L a i p i i=1

39 Maximizing Is this upper bound, max j j=1,...,l a j for the lowest possible upper bound? L a i p i i=1 Yes, since it is achievable by appropriate choice of p i s Set p i = 0 if a i < max a j. j j=1,...,l Set remaining p i s so that 0 p i and L p i = 1. i=1

40 Maximizing Suppose a ij, i = 1, 2,..., L and j = 1, 2,..., J is a sequence of numbers and p ij satisfies J 0 p ij and p ij = 1 for i = 1, 2,..., L. Then the lowest upper bound for j=1 J j=1 a ij p ij is max k k=1,...,k a ik, i = 1, 2,..., L.

41 Maximizing a ij max a ik for each i and j. Since p ij 0. k p ij aij p ij max k a ik for each i and j. Summing over j, J p ij a ij j=1 J j=1 max k p ij max k a ik. a ik = max k a ik J j=1 p ij Thus, max k a ik is an upper bound for J p ij a ij. j=1

42 Maximizing max k a ik is the lowest upper bound for achievable. Set p ij = 0 if a ij max k a ik Set remaining p ij s so that J 0 p ij and p ij = 1. j=1 J p ij a ij since it is j=1

43 Expected Gain The Expected Gain is then E[e; f ] = = = K K j=1 k=1 K K j=1 k=1 K d D k=1 e(c j, c k ) P T (c j, c k, d) d D e(c j, c k ) P T (c j, d)f d (c k ) d D K f d (c k ) e(c j, c k ) P T (c j, d) j=1

44 Maximizing Expected Gain But for any conditional probability f d (c), K K f d (c k ) e(c d D j, c k )P T (c j, d) k=1 j=1 K max e(c n j, c n )P T (c j, d) d D j=1

45 Maximizing Expected Gain If for each d, there exists a unique m = m(d) such that K e(c j, c k )P T (c j, d) j=1 K e(c j, c m )P T (c j, d), for each k, j=1 Then setting f d (c m ) = 1 f d (c k ) = 0, k m achieves the upper bound. }

46 Maximizing Expected Gain In general, the upper bound is achievable by using a decision rule f defined by f d (c k ) = 0 for each c k and d such that K j=1 e(c j, c k )P T (c j, d) < max n K e(c j, c n )P T (c j, d) j=1 and the remaining f d (c k ) are set in any way so that f d (c k ) 0 K f d (c k ) = 1 k=1

47 Bayes Decision Rule Definition A decision rule f is called a Bayes decision rule if and only if E[e; f ] E[e; g] for any decision rule g.

48 Bayes Decision Rule A Bayes rule can always be implemented as a deterministic decision rule. ASSIGNED c 1 c 2 c K T c 1 P T (c 1, d) e(c 1, c 1 ) e(c 1, c 2 ) e(c 1, c K ) R c 2 P T (c 2, d) e(c 2, c 1 ) U... E. c K P T (c K, d) e(c K, c 1 ) e(c K, c 2 ) e(c K, c K ) K e(c j, c k )P T (c j, d) j=1

49 Bayes Decision Rule Given measurement d, assign to any class c k satisfying K e(c j, c k )P T (c j, d) j=1 K e(c j, c n )P T (c j, d), n = 1,..., K j=1 If the economic gain matrix e is the identity, then assign to any class c k satisfying P T (c k, d) P T (c n, d), n = 1,..., K E[e] = max n d D n=1,...,k P T (c n, d) = P correct

50 Bayes Decision Rule P T (c, d) Measurement e Assigned True d 1 d 2 d 3 True c 1 c 2 c c c c f d (c) Measurement Assigned d 1 d 2 d 3 c c E[e] = = 1.02

51 The Confusion Matrix P TA (c 1, c 1 ) = P TA (c 1, c 2 ) = P TA (c 2, c 1 ) = P TA (c 2, c 2 ) = f d (c 1 )P T (c 1, d) d D f d (c 2 )P T (c 1, d) d D f d (c 1 )P T (c 2, d) d D f d (c 2 )P T (c 2, d) d D P TA Assigned True c 1 c 2 c 1 P TA (c 1, c 1 ) P TA (c 1, c 2 ) c 2 P TA (c 2, c 1 ) P TA (c 2, c 2 )

52 The Confusion Matrix P T (c, d) Measurement True d 1 d 2 d 3 c c f d c 2 c 2 c 1 P TA Assigned P T True c 1 c 2 c c

53 The Confusion Matrix P T (c, d) Measurement True d 1 d 2 d 3 c c f d c 2 c 1 c 1 P TA Assigned P T True c 1 c 2 c c

54 Bayes Decision Rule Given measurement d, assign to any class c k satisfying K K e(c j, c k )P T (c j, d) e(c j, c n )P T (c j, d), n = 1,..., K j=1 j=1 Or equivalently, assign to any class c k satisfying K j=1 e(c j, c k ) P T (c j, d) P(d) K j=1 e(c j, c n ) P T (c j, d), n = 1,..., K P(d)

55 Bayes Decision Rule Given measurement d, assign to any class c k satisfying K e(c j, c k )P T (c j d) j=1 K e(c j, c n )P T (c j d), n = 1,..., K j=1 Therefore, the only probability information we need for a Bayes rule are the conditional probabilities P T (c j d), j = 1,..., K

56 Two Class Bayes Rule Assume the economic gain for a correct assignment is greater than the gain for an incorrect assignment. e(c 1, c 1 ) > e(c 1, c 2 ) e(c 2, c 2 ) > e(c 2, c 1 ) Assume all probabilities are positive Assign to class c 1 if P(c 1 d)e(c 1, c 1 ) + P(c 2 d)e(c 2, c 1 ) P(c 1 d)e(c 1, c 2 ) + P(c 2 d)e(c 2, c 2 ) Otherwise assign to class c 2. This inequality can be simplified. P(c 1 d)e(c 1, c 1 ) P(c 1 d)e(c 1, c 2 ) P(c 2 d)e(c 2, c 2 ) P(c 2 d)e(c 2, c 1 ) P(c 1 d)(e(c 1, c 1 ) e(c 1, c 2 )) P(c 2 d)(e(c 2, c 2 ) e(c 2, c 1 )) P(c 1 d) P(c 2 d) e(c2, c 2 ) e(c 2, c 1 ) e(c 1, c 1 ) e(c 1, c 2 )

57 Two Class Bayes Rule: Odd s Ratio Assign to class c 1 if P(c 1 d) P(c 2 d) e(c2, c 2 ) e(c 2, c 1 ) e(c 1, c 1 ) e(c 1, c 2 ) Otherwise assign to class c 2. The odd s ratio R in favor of class c 1 is defined by Assign to class c 1 if R(d) = P(c1 d) P(c 2 d) R(d) e(c2, c 2 ) e(c 2, c 1 ) e(c 1, c 1 ) e(c 1, c 2 ) Θ

58 Two Class Bayes Rule: Likelihood Ratio The Likelihood ratio in favor of class c 1 is defined by Assign to class c 1 if P(c 1 d) P(c 2 d) P(c 1 d)p(d)/p(c 1 ) P(c 2 d)p(d)/p(c 2 ) L(d) = P(d c1 ) P(d c 2 ) e(c2, c 2 ) e(c 2, c 1 ) e(c 1, c 1 ) e(c 1, c 2 ) e(c2, c 2 ) e(c 2, c 1 ) e(c 1, c 1 ) e(c 1, c 2 ) P(c 2 ) P(c 1 ) L(d) = P(d c1 ) P(d c 2 e(c2, c 2 ) e(c 2, c 1 ) P(c 2 ) ) e(c 1, c 1 ) e(c 1, c 2 ) P(c 1 ) L(d) Θ

59 Two Class Bayes Rule: Class Conditional Probability Assign to class c 1 if P(c 1 d)e(c 1, c 1 ) + P(c 2 d)e(c 2, c 1 ) P(c 1 d)e(c 1, c 2 ) + P(c 2 d)e(c 2, c 2 ) Otherwise assign to class c 2. P(c 1 d)(e(c 1, c 1 ) e(c 1, c 2 )) P(c 2 d)(e(c 2, c 2 ) e(c 2, c 1 )) P(c 1 d)(e(c 1, c 1 ) e(c 1, c 2 )) (1 P(c 1 d))(e(c 2, c 2 ) e(c 2, c 1 )) P(c 1 d)(e(c 1, c 1 ) e(c 1, c 2 ) + e(c 2, c 2 ) e(c 2, c 1 )) e(c 2, c 2 ) e(c 2, c 1 ) P(c 1 d) e(c 2, c 2 ) e(c 2, c 1 ) e(c 1, c 1 ) e(c 1, c 2 ) + e(c 2, c 2 ) e(c 2, c 1 ) Θ

60 Summary Given measurement d, assign to any class c k satisfying K e(c j, c k )P T (c j d) j=1 K e(c j, c n )P T (c j d), n = 1,..., K j=1 Therefore, the only probability information we need for a Bayes rule are the conditional probabilities P T (c j d), j = 1,..., K

61 Summary P T (c d) = P T (d c)p(c) P(d) P T (d c)p(c) = γ C P T (d γ)p(γ) For a given c, P T (d c) is the class conditional distribution P T (c) is the prior probability of class c Estimating P T (d c) from data is a major problem in Machine Learning

62 Two Class Bayes Rule: Odd s Ratio Assign to class c 1 if P(c 1 d) P(c 2 d) e(c2, c 2 ) e(c 2, c 1 ) e(c 1, c 1 ) e(c 1, c 2 ) Otherwise assign to class c 2. The odd s ratio R in favor of class c 1 is defined by Assign to class c 1 if R(d) = P(c1 d) P(c 2 d) R(d) e(c2, c 2 ) e(c 2, c 1 ) e(c 1, c 1 ) e(c 1, c 2 ) Θ

63 Two Class Bayes Rule: Likelihood Ratio The Likelihood ratio in favor of class c 1 is defined by Assign to class c 1 if L(d) = P(d c1 ) P(d c 2 ) L(d) = P(d c1 ) P(d c 2 e(c2, c 2 ) e(c 2, c 1 ) ) e(c 1, c 1 ) e(c 1, c 2 ) L(d) Θ P(c 2 ) P(c 1 )

64 Two Class P(c 1 d) e(c 2, c 2 ) e(c 2, c 1 ) e(c 1, c 1 ) e(c 1, c 2 ) + e(c 2, c 2 ) e(c 2, c 1 ) Θ

65 Major Problem Estimating P T (d c) from data is a major problem in Machine Learning

66 Mid Term Project Part 1 Measurement Space D = N n=1 L n Measurement d has N components d = (d 1,..., d N ) Input L n = M n, n = 1,..., N Given (d 1,..., d N ) compute the linear address Input K, The Number of Classes Input e the K K economic gain matrix Input P(d c) the class conditional probabilities Input P(c) the prior probabilities Compute P(c d) Compute the Discrete Bayes Rule f d (c) Compute the K K Confusion Matrix Compute the Expected Gain

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning CS4375 --- Fall 2018 Bayesian a Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell 1 Uncertainty Most real-world problems deal with

More information

Introduction to Machine Learning

Introduction to Machine Learning Uncertainty Introduction to Machine Learning CS4375 --- Fall 2018 a Bayesian Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell Most real-world problems deal with

More information

Bayesian Decision Theory

Bayesian Decision Theory Bayesian Decision Theory Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Bayesian Decision Theory Bayesian classification for normal distributions Error Probabilities

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

Simple Counter-terrorism Decision

Simple Counter-terrorism Decision A Comparative Analysis of PRA and Intelligent Adversary Methods for Counterterrorism Risk Management Greg Parnell US Military Academy Jason R. W. Merrick Virginia Commonwealth University Simple Counter-terrorism

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

A Bayesian model for event-based trust

A Bayesian model for event-based trust A Bayesian model for event-based trust Elements of a foundation for computational trust Vladimiro Sassone ECS, University of Southampton joint work K. Krukow and M. Nielsen Oxford, 9 March 2007 V. Sassone

More information

CS 188: Artificial Intelligence Spring Today

CS 188: Artificial Intelligence Spring Today CS 188: Artificial Intelligence Spring 2006 Lecture 9: Naïve Bayes 2/14/2006 Dan Klein UC Berkeley Many slides from either Stuart Russell or Andrew Moore Bayes rule Today Expectations and utilities Naïve

More information

Introduction to Statistical Inference

Introduction to Statistical Inference Structural Health Monitoring Using Statistical Pattern Recognition Introduction to Statistical Inference Presented by Charles R. Farrar, Ph.D., P.E. Outline Introduce statistical decision making for Structural

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Mathematical Formulation of Our Example

Mathematical Formulation of Our Example Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

STAT:5100 (22S:193) Statistical Inference I

STAT:5100 (22S:193) Statistical Inference I STAT:5100 (22S:193) Statistical Inference I Week 3 Luke Tierney University of Iowa Fall 2015 Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 1 Recap Matching problem Generalized

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

ECE521 Lecture7. Logistic Regression

ECE521 Lecture7. Logistic Regression ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard

More information

From inductive inference to machine learning

From inductive inference to machine learning From inductive inference to machine learning ADAPTED FROM AIMA SLIDES Russel&Norvig:Artificial Intelligence: a modern approach AIMA: Inductive inference AIMA: Inductive inference 1 Outline Bayesian inferences

More information

Introduction to Machine Learning

Introduction to Machine Learning Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},

More information

Lecture 2: Statistical Decision Theory (Part I)

Lecture 2: Statistical Decision Theory (Part I) Lecture 2: Statistical Decision Theory (Part I) Hao Helen Zhang Hao Helen Zhang Lecture 2: Statistical Decision Theory (Part I) 1 / 35 Outline of This Note Part I: Statistics Decision Theory (from Statistical

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu October 19, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

The Kernel Trick. Robert M. Haralick. Computer Science, Graduate Center City University of New York

The Kernel Trick. Robert M. Haralick. Computer Science, Graduate Center City University of New York The Kernel Trick Robert M. Haralick Computer Science, Graduate Center City University of New York Outline SVM Classification < (x 1, c 1 ),..., (x Z, c Z ) > is the training data c 1,..., c Z { 1, 1} specifies

More information

CSC 411: Lecture 09: Naive Bayes

CSC 411: Lecture 09: Naive Bayes CSC 411: Lecture 09: Naive Bayes Class based on Raquel Urtasun & Rich Zemel s lectures Sanja Fidler University of Toronto Feb 8, 2015 Urtasun, Zemel, Fidler (UofT) CSC 411: 09-Naive Bayes Feb 8, 2015 1

More information

Introduction to AI Learning Bayesian networks. Vibhav Gogate

Introduction to AI Learning Bayesian networks. Vibhav Gogate Introduction to AI Learning Bayesian networks Vibhav Gogate Inductive Learning in a nutshell Given: Data Examples of a function (X, F(X)) Predict function F(X) for new examples X Discrete F(X): Classification

More information

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul Generative Learning INFO-4604, Applied Machine Learning University of Colorado Boulder November 29, 2018 Prof. Michael Paul Generative vs Discriminative The classification algorithms we have seen so far

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Classification: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network

More information

Number, Number Sense, and Operations Data Analysis and Probability

Number, Number Sense, and Operations Data Analysis and Probability Algebra 1 Unit 1 Numbers 3 weeks Number, Number Sense, and Operations Data Analysis and Probability NC Apply properties of operations and the real number system, and justify when they hold for a set of

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Lecture 9: Naive Bayes, SVM, Kernels. Saravanan Thirumuruganathan

Lecture 9: Naive Bayes, SVM, Kernels. Saravanan Thirumuruganathan Lecture 9: Naive Bayes, SVM, Kernels Instructor: Outline 1 Probability basics 2 Probabilistic Interpretation of Classification 3 Bayesian Classifiers, Naive Bayes 4 Support Vector Machines Probability

More information

Introduction into Bayesian statistics

Introduction into Bayesian statistics Introduction into Bayesian statistics Maxim Kochurov EF MSU November 15, 2016 Maxim Kochurov Introduction into Bayesian statistics EF MSU 1 / 7 Content 1 Framework Notations 2 Difference Bayesians vs Frequentists

More information

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Bayesian decision theory 8001652 Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Jussi Tohka jussi.tohka@tut.fi Institute of Signal Processing Tampere University of Technology

More information

Bayesian decision making

Bayesian decision making Bayesian decision making Václav Hlaváč Czech Technical University in Prague Czech Institute of Informatics, Robotics and Cybernetics 166 36 Prague 6, Jugoslávských partyzánů 1580/3, Czech Republic http://people.ciirc.cvut.cz/hlavac,

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

Approximate Inference in Practice Microsoft s Xbox TrueSkill TM

Approximate Inference in Practice Microsoft s Xbox TrueSkill TM 1 Approximate Inference in Practice Microsoft s Xbox TrueSkill TM Daniel Hernández-Lobato Universidad Autónoma de Madrid 2014 2 Outline 1 Introduction 2 The Probabilistic Model 3 Approximate Inference

More information

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012) Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: (Finish) Expectation Maximization Principal Component Analysis (PCA) Readings: Barber 15.1-15.4 Dhruv Batra Virginia Tech Administrativia Poster Presentation:

More information

Chapter 6 Classification and Prediction (2)

Chapter 6 Classification and Prediction (2) Chapter 6 Classification and Prediction (2) Outline Classification and Prediction Decision Tree Naïve Bayes Classifier Support Vector Machines (SVM) K-nearest Neighbors Accuracy and Error Measures Feature

More information

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

ECE313 Summer Problem Set 7. Reading: Cond. Prob., Law of total prob., Hypothesis testinng Quiz Date: Tuesday, July 3

ECE313 Summer Problem Set 7. Reading: Cond. Prob., Law of total prob., Hypothesis testinng Quiz Date: Tuesday, July 3 ECE313 Summer 2012 Problem Set 7 Reading: Cond. Prob., Law of total prob., Hypothesis testinng Quiz Date: Tuesday, July 3 Note: It is very important that you solve the problems first and check the solutions

More information

An Introduction to Statistical and Probabilistic Linear Models

An Introduction to Statistical and Probabilistic Linear Models An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information

Bayesian Decision Theory

Bayesian Decision Theory Bayesian Decision Theory 1/27 lecturer: authors: Jiri Matas, matas@cmp.felk.cvut.cz Václav Hlaváč, Jiri Matas Czech Technical University, Faculty of Electrical Engineering Department of Cybernetics, Center

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Artificial Intelligence & Sequential Decision Problems

Artificial Intelligence & Sequential Decision Problems Artificial Intelligence & Sequential Decision Problems (CIV6540 - Machine Learning for Civil Engineers) Professor: James-A. Goulet Département des génies civil, géologique et des mines Chapter 15 Goulet

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging

More information

Parameter Estimation. Industrial AI Lab.

Parameter Estimation. Industrial AI Lab. Parameter Estimation Industrial AI Lab. Generative Model X Y w y = ω T x + ε ε~n(0, σ 2 ) σ 2 2 Maximum Likelihood Estimation (MLE) Estimate parameters θ ω, σ 2 given a generative model Given observed

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Linear Discrimination Functions

Linear Discrimination Functions Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach

More information

Detection Theory. Chapter 3. Statistical Decision Theory I. Isael Diaz Oct 26th 2010

Detection Theory. Chapter 3. Statistical Decision Theory I. Isael Diaz Oct 26th 2010 Detection Theory Chapter 3. Statistical Decision Theory I. Isael Diaz Oct 26th 2010 Outline Neyman-Pearson Theorem Detector Performance Irrelevant Data Minimum Probability of Error Bayes Risk Multiple

More information

Machine Learning, Fall 2012 Homework 2

Machine Learning, Fall 2012 Homework 2 0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0

More information

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 8 Feb. 12, 2018 1 10-601 Introduction

More information

T Machine Learning: Basic Principles

T Machine Learning: Basic Principles Machine Learning: Basic Principles Laboratory of Computer and Information Science (CIS) Department of Computer Science and Engineering Helsinki University of Technology (TKK) Autumn 2007 Outline Elements

More information

Notes on Blackwell s Comparison of Experiments Tilman Börgers, June 29, 2009

Notes on Blackwell s Comparison of Experiments Tilman Börgers, June 29, 2009 Notes on Blackwell s Comparison of Experiments Tilman Börgers, June 29, 2009 These notes are based on Chapter 12 of David Blackwell and M. A.Girshick, Theory of Games and Statistical Decisions, John Wiley

More information

Decision Theory. What is it? Alexei A. Borissov February 1, 2007

Decision Theory. What is it? Alexei A. Borissov February 1, 2007 Decision Theory What is it? Alexei A. Borissov February 1, 2007 Agenda Background History Bayes Theorem Example of Bayes Theorem Expected Utility Expected Utility Example Further Research Conclusion Background

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Generative Classifiers: Part 1. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang

Generative Classifiers: Part 1. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang Generative Classifiers: Part 1 CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang 1 This Week Discriminative vs Generative Models Simple Model: Does the patient

More information

COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017

COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017 COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University TOPIC MODELING MODELS FOR TEXT DATA

More information

CS4705. Probability Review and Naïve Bayes. Slides from Dragomir Radev

CS4705. Probability Review and Naïve Bayes. Slides from Dragomir Radev CS4705 Probability Review and Naïve Bayes Slides from Dragomir Radev Classification using a Generative Approach Previously on NLP discriminative models P C D here is a line with all the social media posts

More information

Learning with Probabilities

Learning with Probabilities Learning with Probabilities CS194-10 Fall 2011 Lecture 15 CS194-10 Fall 2011 Lecture 15 1 Outline Bayesian learning eliminates arbitrary loss functions and regularizers facilitates incorporation of prior

More information

A Dynamic Level-k Model in Games

A Dynamic Level-k Model in Games Dynamic Level-k Model in Games Teck Ho and Xuanming Su UC erkeley March, 2010 Teck Hua Ho 1 4-stage Centipede Game 4 2 16 8 1 8 4 32 1 2 3 4 64 16 5 Outcome Round 1 2 3 4 5 1 5 6.2% 30.3% 35.9% 20.0% 7.6%

More information

Quantization. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Quantization. Robert M. Haralick. Computer Science, Graduate Center City University of New York Quantization Robert M. Haralick Computer Science, Graduate Center City University of New York Outline Quantizing 1 Quantizing 2 3 4 5 6 Quantizing Data is real-valued Data is integer valued with large

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

CS446: Machine Learning Fall Final Exam. December 6 th, 2016

CS446: Machine Learning Fall Final Exam. December 6 th, 2016 CS446: Machine Learning Fall 2016 Final Exam December 6 th, 2016 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Bayesian Learning Extension

Bayesian Learning Extension Bayesian Learning Extension This document will go over one of the most useful forms of statistical inference known as Baye s Rule several of the concepts that extend from it. Named after Thomas Bayes this

More information

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping Plan for today! Part 1: (Hidden) Markov models! Part 2: String matching and read mapping! 2.1 Exact algorithms! 2.2 Heuristic methods for approximate search (Hidden) Markov models Why consider probabilistics

More information

Rationality and Uncertainty

Rationality and Uncertainty Rationality and Uncertainty Based on papers by Itzhak Gilboa, Massimo Marinacci, Andy Postlewaite, and David Schmeidler Warwick Aug 23, 2013 Risk and Uncertainty Dual use of probability: empirical frequencies

More information

The Naïve Bayes Classifier. Machine Learning Fall 2017

The Naïve Bayes Classifier. Machine Learning Fall 2017 The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning

More information

a b = a T b = a i b i (1) i=1 (Geometric definition) The dot product of two Euclidean vectors a and b is defined by a b = a b cos(θ a,b ) (2)

a b = a T b = a i b i (1) i=1 (Geometric definition) The dot product of two Euclidean vectors a and b is defined by a b = a b cos(θ a,b ) (2) This is my preperation notes for teaching in sections during the winter 2018 quarter for course CSE 446. Useful for myself to review the concepts as well. More Linear Algebra Definition 1.1 (Dot Product).

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Naive Bayes classification

Naive Bayes classification Naive Bayes classification Christos Dimitrakakis December 4, 2015 1 Introduction One of the most important methods in machine learning and statistics is that of Bayesian inference. This is the most fundamental

More information

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor

More information

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures

More information

Tracking and Identification of Multiple targets

Tracking and Identification of Multiple targets Tracking and Identification of Multiple targets Samir Hachour, François Delmotte, Eric Lefèvre, David Mercier Laboratoire de Génie Informatique et d'automatique de l'artois, EA 3926 LGI2A first name.last

More information

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Linear Classification Given labeled data: (xi, feature vector yi) label i=1,..,n where y is 1 or 1, find a hyperplane to separate from Linear Classification

More information

Data Mining Part 4. Prediction

Data Mining Part 4. Prediction Data Mining Part 4. Prediction 4.3. Fall 2009 Instructor: Dr. Masoud Yaghini Outline Introduction Bayes Theorem Naïve References Introduction Bayesian classifiers A statistical classifiers Introduction

More information

Maxent Models and Discriminative Estimation

Maxent Models and Discriminative Estimation Maxent Models and Discriminative Estimation Generative vs. Discriminative models (Reading: J+M Ch6) Introduction So far we ve looked at generative models Language models, Naive Bayes But there is now much

More information

Notes on Machine Learning for and

Notes on Machine Learning for and Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori

More information

Algorithm-Independent Learning Issues

Algorithm-Independent Learning Issues Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning

More information

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014 Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

Knowledge Discovery in Data: Overview. Naïve Bayesian Classification. .. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar..

Knowledge Discovery in Data: Overview. Naïve Bayesian Classification. .. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar Knowledge Discovery in Data: Naïve Bayes Overview Naïve Bayes methodology refers to a probabilistic approach to information discovery

More information

From Bayes Theorem to Pattern Recognition via Bayes Rule

From Bayes Theorem to Pattern Recognition via Bayes Rule From Bayes Theorem to Pattern Recognition via Bayes Rule Slecture by Varun Vasudevan (partially based on Prof. Mireille Boutin s ECE 662 lecture) February 12, 2014 What will you learn from this slecture?

More information

Topics. Bayesian Learning. What is Bayesian Learning? Objectives for Bayesian Learning

Topics. Bayesian Learning. What is Bayesian Learning? Objectives for Bayesian Learning Topics Bayesian Learning Sattiraju Prabhakar CS898O: ML Wichita State University Objectives for Bayesian Learning Bayes Theorem and MAP Bayes Optimal Classifier Naïve Bayes Classifier An Example Classifying

More information

Minimum Error Rate Classification

Minimum Error Rate Classification Minimum Error Rate Classification Dr. K.Vijayarekha Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur-613 401 Table of Contents 1.Minimum Error Rate Classification...

More information

Automatic Root Cause Identification Using Most Probable Alignments

Automatic Root Cause Identification Using Most Probable Alignments Automatic Root Cause Identification Using Most Probable Alignments Marie Koorneef, Andreas Solti 2, Henrik Leopold, Hajo A. Reijers,3 Department of Computer Sciences, Vrije Universiteit Amsterdam, The

More information

Statistical learning. Chapter 20, Sections 1 4 1

Statistical learning. Chapter 20, Sections 1 4 1 Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Hierarchical Models & Bayesian Model Selection

Hierarchical Models & Bayesian Model Selection Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Contact information Please report any typos or

More information

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4 ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian

More information

Classification 1: Linear regression of indicators, linear discriminant analysis

Classification 1: Linear regression of indicators, linear discriminant analysis Classification 1: Linear regression of indicators, linear discriminant analysis Ryan Tibshirani Data Mining: 36-462/36-662 April 2 2013 Optional reading: ISL 4.1, 4.2, 4.4, ESL 4.1 4.3 1 Classification

More information

Bayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI

Bayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI Bayes Classifiers CAP5610 Machine Learning Instructor: Guo-Jun QI Recap: Joint distributions Joint distribution over Input vector X = (X 1, X 2 ) X 1 =B or B (drinking beer or not) X 2 = H or H (headache

More information

Bayesian Networks 2:

Bayesian Networks 2: 1/27 PhD seminar series Probabilistics in Engineering : Bayesian networks and Bayesian hierarchical analysis in engineering Conducted by Prof. Dr. Maes, Prof. Dr. Faber and Dr. Nishijima Bayesian Networks

More information

Lecture 3: Pattern Classification

Lecture 3: Pattern Classification EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures

More information