Probability Review and Naive Bayes. Mladen Kolar and Rob McCulloch

Size: px
Start display at page:

Download "Probability Review and Naive Bayes. Mladen Kolar and Rob McCulloch"

Transcription

1 Probability Review and Naive Bayes Mladen Kolar and Rob McCulloch

2 1. Probability and Graphical Models 2. Quick Probability Review 3. The Sinus Example 4. Bayesian Networks 5. Bayes Rule Classification 6. Naive Bayes Classification 7. Sentiment Analysis: Spam or Ham

3 1. Probability and Graphical Models Joint probability distributions help us understand uncertain relationships between many variables. Let X = (X 1, X 2,..., X p ) be a vector of random variables. Let x = (x 1, x 2,..., x p ) be an outcome. If x is continuous we write P(X A) = If x is discrete we write A p(x)dx P(X A) = A p(x) 1

4 Figuring out p(x) and understanding it in high dimensions is a very difficult and very important problem. As usual, to make things manageable we look for simplifying structure. The simplest structure is independence: p(x 1, x 2,..., x p ) = p(x 1 ) p(x 2 )... p(x p ) = p(x i ). 2

5 But complete independence is completely boring. It rules out relationships amongst the x s!!!! Instead we will look for conditional independence. A pair of variable may not be independent, but they may be independent conditional on knowledge of a third!! 3

6 In ML/DS, graphical models means a nice way to think about conditional independence structures in joint distributions. Generally, graphical models capture kinds of conditional independence. We will quickly review basic probability notions, using some of the language associated with graphical models. Later in the course we may study graphical models more systematically. 4

7 2. Quick Probability Review Note: we will use notation appropriate for discrete random variables: and so on. Two Variables p(x 1, x 2 ) = P(X 1 = x 1, X 2 = x 2 ) Suppose we have joint disbtribution p(x 1, x 2 ) then we can compute the marginal p(x 1 ) (or p(x 2 )) and the conditional (or p(x 1 x 2 )). p(x 2 x 1 ) = p(x 1, x 2 ) p(x 1 ) 5

8 We can always get (x 1, x 2 ) by first getting x 1 and then getting x 2 conditional on x 1 : p(x 1, x 2 ) = p(x 1 ) p(x 2 x 1 ). X 1 and X 2 are independent if p(x 2 x 1 ) = p(x 2 ) (x 1, x 2 ). so that, p(x 1, x 2 ) = p(x 1 ) p(x 2 ). X 1 and X 2 are independent if no matter what you learn about X 1 your beliefs about X 2 are the same as if you knew nothing about X 1. 6

9 Three Variables Have p(x 1, x 2, x 3 ) = P(X 1 = x 1, X 2 = x 2, X 3 = x 3 ). Then, p(x 3 x 1, x 2 ) = p(x 1, x 2, x 3 ) p(x 1, x 2 ) and p(x 3, x 2 x 1 ) = p(x 1, x 2, x 3 ) p(x 1 ) 7

10 Note: p(x 3 x 1, x 2 ) p(x 1, x 2, x 3 ) p(x 3, x 2 x 1 ) p(x 1, x 2, x 3 ) The conditional is proportional to the joint. We always have: p(x 1, x 2, x 3 ) = p(x 1 ) p(x 2, x 3 x 1 ) = p(x 1 ) p(x 2 x 1 ) p(x 3 x 1, x 2 ) 8

11 Conditional Independence X 3 is independent of X 2 conditional on X 1 if p(x 3 x 2, x 1 ) = p(x 3 x 1 ) (x 1, x 2, x 3 ). Given you know X 1 = x 1, X 2 tells you nothing about X 3. We write X 2 X 3 X 1 9

12 If X 2 X 3 X 1 then, p(x 1, x 2, x 2 ) = p(x 1 )p(x 2 x 1 )p(x 3 x 1, x 2 ) = p(x 1 )p(x 2 x 1 )p(x 3 x 1 ) It is this kind of simplification of the joint structure that graphical models are meant to capture. 10

13 X 2 X 3 X 1 then we could have, p(x 1, x 2, x 3 ) = p(x 1 ) p(x 3 x 1 ) p(x 2 x 1 ). This is a directed graphical model. Each variable is a node in the graph. The arrows pointing into a node tell you which (of the previous) variables it depends on. 11

14 If X 1 X 3 X 2 then we could have, p(x 1, x 2, x 3 ) = p(x 1 ) p(x 2 x 1 ) p(x 3 x 2 ). 12

15 Complete independence: Complete dependence: 13

16 Our graphs are DAGs: Directed Acyclic Graphs: Directed: The connections are arrows that point from one node (variable) to another. Acyclic: You can t start at a node and follow the arrows back to the same node. 14

17 Several Variables: Our definitions extend to many variables easily by considering groups. Suppose we have (X 1, X 2,..., X p ). Let, G = {i 1,, i 2,..., i q } {1, 2,..., p} and then,. X G = {X i : i G} We can say one group of variables is independent of another group conditional on a third group X G3 X G2 X G1. 15

18 Conditional independence can be a very powerfull way to capture the assumptions in a model. For example in the model: Y = X β + ɛ, ɛ N(0, σ 2 ), ɛ X where X is considered a random vector we have: Y X X β. Given you know X β, X has no more information about Y. 16

19 3. The Sinus Example Suppose we know the following: The flu causes sinus inflammation. Allergies cause sinus inflammation. Sinus inflammation causes a runny nose. Sinus inflammation causes headaches. 17

20 The flu causes sinus inflammation. Allergies cause sinus inflammation. Sinus inflammation causes a runny nose. Sinus inflammation causes headaches. Note: In this example we explicitly interpret the relationships as causal. More generally they may just be observational. 18

21 Let F = 1 if a person has the flu, 0 else, A = 1 if a person has allergies, 0 else, S = 1 if a person has sinus inflammation, 0 else, H = 1 if a person has a headache, 0 else, N = 1 if a person has a runny nose, 0 else. Then, p(f, a, s, h, n) = p(f ) p(a) p(s f, a) p(h s) p(n s). 19

22 The absence of edges contains information!!! p(f, a, s, h, n) = p(f ) p(a) p(s f, a) p(h s) p(n s). so, for example, (H, N) (F, A) S. F effects N, but only through S. 20

23 4. Bayesian Networks A Bayian Network is a DAG: Directed Acyclic Graphical representation of the joint distribuiton of X = (X 1, X 2,..., X p ). Given a specific ordering of the variables, let P i be a set of variables such that p(x i x 1, x 2,..., x i 1 ) = p(x i P i ) where P i are the parents of X i, P i {x 1, x 2,..., x i 1 }. Then, p(x 1, x 2,..., x p ) = i p(x i P i ). Local Markov Assumption: A variable X is independent of its non-descendants given its parents (only the parents). 21

24 Bayes Nets: Algorithms for fitting DAG structures to data are in important part of Machine Learning/Data Science/Artificial Intelligence. Also important are algorithms for computing conditionals and marginals. In our sinus example we might want to know p(a = 1 N = 1). 22

25 An important example of a Machine Learning technique which exploits a certain conditional independence structure is the Naive Bayes Classifier. We will study NB below. 23

26 5. Bayes Rule Classification Consider the directed data mining problem with a categorical Y. Many methods can be viewed as an attempt to estimate: p(y x) the conditional distribution of Y given X = x. For example in logistic regression we have: p(y = 1 x) Bernoulli(p(x)), p(x) = x β. 24

27 An alternative approach is to estimate the full joint distribution of (X, Y ) by estimating the marginal for for Y (p(y)) and the conditional for X (p(x y)). We then have the joint via: p(x, y) = p(y) p(x y). And classification is then obtained from Bayes Theorem: p(y x) = p(y)p(x y) p(x) p(y)p(x y) As usual we can predict the most probable y or make a decision based on the probabilities. 25

28 In general, it seems more straighforward to estimate p(y x) directly but some well known approaches take the Bayes-rule path. For example, for x numeric, classic discriminant analysis assumes p(x y) N(µ y, Σ). Remember, Y is discrete and we have to estimate p(x y) for each possible y. 26

29 6. Naive Bayes Classification Naive Bayes classification simplifies the problem by assumming that the elements of X are conditionally independent given Y. p(x, y) = p(y) i p(x i y) so that p(y x) p(y) i p(x i y) Each coordinate x i of x gets to multiply in it s own contribution of evidence about y depending on how likely x i would be if Y = y. Given x, a simple approach to classification just guesses the y having the largest p(y x). 27

30 28

31 NB has some key advantages: We only have to estimate the low dimension p(x i y) instead of the high dimensional p(x y)!!!! Many small bits of information from each x i can be combined. It is simple. The main disadvantage is that the conditional independence assumption often seems inappropriate. However, this does not seem to keep from working very well in practice!!! According to Mladen, NB is the single most used classifier out there. NB often performs well, even when the assumption is violated. 29

32 7. Sentiment Analysis: Spam or Ham Sentiment analysis tries to classify text documents. A popular approach is to combine bag of words with NB. Each word in the document provides an additional independent piece of evidence about the kind of document it is. A simple example is trying to classify the document as spam or not: spam or ham. 30

33 Bag of words means just that, we ignore the order of the words. The document: When the lecture is over, remember to wake up the person sitting next to you in the lecture room. is the same as the document, in is lecture lecture next over person remember room sitting the the the to to up wake when you 31

34 Sentiment Analysis has been used very creatively: Classify a bunch of discussion posts on the economy at time t and then predict whether the stock market will go up up at time t + 1 given how many are classified as positive. Classify a bunch of discussion posts as favorable to candidate A or candidate B and then predict how they will do in the next poll or the election. 32

35 SMS Spam Data: Note: this follows Chapter 4 of Machine Learning with R, by Brett Lanz. Note: sms: short message service. Have 5,559 sms text message documents. Each one is labelled as spam or ham. Here is the first (ham) and fourth (spam) observation: > smsraw[1,] type text 1 ham Hope you are having a good week. Just checking in > smsraw[4,] type 4 spam 4 complimentary 4 STAR Ibiza Holiday or 10,000 cash needs your URGENT collection NOW from Landline not to lose out! Box434SK38WP150PPM18+ 33

36 Work flow: clean: tolower, kill numbers, punctuation, stopwords stem: (help,helped,helping,helps) becomes (help,help,help,help) tokenization: split a document up into single words (or tokens or terms ). document term matrix (DTM): rows indicate documents columns are counts for terms. train/test split. throw away low count terms. convert DTM to indicators: Yes if the word (term) is in the document, 0 else. do Naive Bayes!! 34

37 Note: Most of the work is processing the data!!!!! This is typically the case in real world applications. Getting the data into a form that allows you to analysize it is time consuming and very important. Garbage in, garbage out!!! In class we will typically focus on getting a basic understanding of the methods and not emphasize the data wrangling. 35

38 Clean and Stem Here are the first two documents: > smsraw$text[1] [1] "Hope you are having a good week. Just checking in" > smsraw$text[2] [1] "K..give back my thanks." Here are the first 2 docs after cleaning. smscc is the cleaned data in the Corpus data structure from the tm R package. smscc is for sms data as a Cleaned Corpus. > smscc[[1]][1] $content [1] "hope good week just check" > smscc[[2]][1] $content [1] "kgive back thank" 36

39 Tokenize and get DTM Tokenization gives us 6518 words (or terms) from all the sms documents. The i th row of the DTM gives us the count for each term in document i. > print(dim(smsdtm)) [1] > library(slam) #for col_sums > summary(col_sums(smsdtm)) #summarize total time a term is used. Min. 1st Qu. Median Mean 3rd Qu. Max > terms = smsdtm$dimnames$terms > nterm = length(terms) > set.seed(14) > ii = sample(1:nterm,20) > terms[ii] [1] "effect" "pinku" "wikipediacom" "mundh" "wwwsmsconet" [6] "marsm" "voic" "itz" "logo" "hip" [11] "transfr" "colin" "leo" "technolog" "text" [16] "scratch" "graze" "prolli" "tech" "ofsi" 37

40 Word frequencies for ham: Word frequencies for spam: 38

41 Train/Test #train and test # creating training and test datasets smstrain = smsdtm[1:4169, ] smstest = smsdtm[4170:5559, ] # also save the labels smstrainy = smsraw[1:4169, ]$type smstesty = smsraw[4170:5559, ]$type > prop.table(table(smstrainy)) smstrainy ham spam > prop.table(table(smstesty)) smstesty ham spam

42 Throw Away Terms with Low Frequency # save frequently-appearing terms to a character vector smsfreqwords = findfreqterms(smstrain, 5) > str(smsfreqwords) chr [1:1136] "abiola" "abl" "abt" "accept" "access" "account"... > length(smsfreqwords) [1] 1136 # create DTMs with only the frequent terms smsfreqtrain = smstrain[, smsfreqwords] smsfreqtest = smstest[, smsfreqwords] > dim(smsfreqtrain) [1] > dim(smstest) [1]

43 Convert Counts to Indicators #convert counts to if(count>0) (yes,no) convertcounts <- function(x) { x <- ifelse(x > 0, "Yes", "No") } # apply() convert_counts() to columns of train/test data # these are just matrices smstrain = apply(smsfreqtrain, MARGIN = 2, convertcounts) smstest <- apply(smsfreqtest, MARGIN = 2, convertcounts) > dim(smstrain) [1] > smstrain[1:3,1:5] Terms Docs abiola abl abt accept access 1 "No" "No" "No" "No" "No" 2 "No" "No" "No" "No" "No" 3 "No" "No" "No" "No" "No" 41

44 We are ready for NB!!! library(e1071) smsnb = naivebayes(smstrain, smstrainy) > smsnb$tables[1:3] $abiola abiola smstrainy No Yes ham spam $abl abl smstrainy No Yes ham spam $abt abt smstrainy No Yes ham spam The tables are our p(x i y) terms!! y is ham or spam and x i are the words(terms): abiola, abl, abt,... That is p(abiola = Yes y = ham) =

45 $age age smstrainy No Yes ham spam $adult adult smstrainy No Yes ham spam What is p(y = spam age = Yes)? (The prob the sms is spam given the word age is in it). Let s use p(y = spam) =.14, the training data proportion. p(y = spam age = Yes) = p(y=spam)p(age=yes y=spam) p(y=spam)p(age=yes y=spam)+p(y=ham)p(age=yes y=ham) >.14* /(.14* * ) [1]

46 $age age smstrainy No Yes ham spam $adult adult smstrainy No Yes ham spam What is p(y = spam age = Yes, adult = Yes)? (The prob the sms is spam given the word age is in it and the word adult is in it). p(y = spam age = Yes, adult = Yes) = p(spam)p(age=y spam)p(adult=y spam) p(spam)p(age=y spam)p(adult=y spam)+p(ham)p(age=y ham)p(age=y ham) >.14* * /(.14* * * * ) [1]

47 Out of Sample Confusion Matrix yhat = predict(smsnb,smstest) library(gmodels) CrossTable(yhat, smstesty, prop.chisq = FALSE, prop.t = FALSE, prop.r = FALSE, dnn = c( predicted, actual )) actual predicted ham spam Row Total ham spam Column Total Missclassification rate: 36/1390 = % spam detected:.836. Not bad!! 45

48 Try it again with laplace=1, add 1 to each count when estimating th 2x2 tables. > smsnb2 = naivebayes(smstrain, smstrainy,laplace=1) > smsnb2$tables[1:3] $abiola abiola smstrainy No Yes ham spam $abl abl smstrainy No Yes ham spam $abt abt smstrainy No Yes ham spam Got rid of the 0/1 conditional probabilities, which are extreme. 46

49 > yhat2 = predict(smsnb2,smstest) > CrossTable(yhat2, smstesty, + prop.chisq = FALSE, prop.t = FALSE, prop.r = FALSE, + dnn = c( predicted, actual )) actual predicted ham spam Row Total ham spam Column Total Not too different. 47

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Carlos Carvalho, Mladen Kolar and Robert McCulloch 11/19/2015 Classification revisited The goal of classification is to learn a mapping from features to the target class.

More information

Directed Graphical Models or Bayesian Networks

Directed Graphical Models or Bayesian Networks Directed Graphical Models or Bayesian Networks Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Bayesian Networks One of the most exciting recent advancements in statistical AI Compact

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Bayesian Networks Representation

Bayesian Networks Representation Bayesian Networks Representation Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University March 19 th, 2007 Handwriting recognition Character recognition, e.g., kernel SVMs a c z rr r r

More information

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 Plan for today and next week Today and next time: Bayesian networks (Bishop Sec. 8.1) Conditional

More information

Machine Learning for natural language processing

Machine Learning for natural language processing Machine Learning for natural language processing Classification: Naive Bayes Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 20 Introduction Classification = supervised method for

More information

Representation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2

Representation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2 Representation Stefano Ermon, Aditya Grover Stanford University Lecture 2 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 2 1 / 32 Learning a generative model We are given a training

More information

Naïve Bayes. Vibhav Gogate The University of Texas at Dallas

Naïve Bayes. Vibhav Gogate The University of Texas at Dallas Naïve Bayes Vibhav Gogate The University of Texas at Dallas Supervised Learning of Classifiers Find f Given: Training set {(x i, y i ) i = 1 n} Find: A good approximation to f : X Y Examples: what are

More information

Probabilistic Graphical Networks: Definitions and Basic Results

Probabilistic Graphical Networks: Definitions and Basic Results This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical

More information

Axioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks.

Axioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks. Bayesian Networks Today we ll introduce Bayesian Networks. This material is covered in chapters 13 and 14. Chapter 13 gives basic background on probability and Chapter 14 talks about Bayesian Networks.

More information

Directed Graphical Models

Directed Graphical Models CS 2750: Machine Learning Directed Graphical Models Prof. Adriana Kovashka University of Pittsburgh March 28, 2017 Graphical Models If no assumption of independence is made, must estimate an exponential

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models David Sontag New York University Lecture 4, February 16, 2012 David Sontag (NYU) Graphical Models Lecture 4, February 16, 2012 1 / 27 Undirected graphical models Reminder

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

Conditional probabilities and graphical models

Conditional probabilities and graphical models Conditional probabilities and graphical models Thomas Mailund Bioinformatics Research Centre (BiRC), Aarhus University Probability theory allows us to describe uncertainty in the processes we model within

More information

Tutorial 2. Fall /21. CPSC 340: Machine Learning and Data Mining

Tutorial 2. Fall /21. CPSC 340: Machine Learning and Data Mining 1/21 Tutorial 2 CPSC 340: Machine Learning and Data Mining Fall 2016 Overview 2/21 1 Decision Tree Decision Stump Decision Tree 2 Training, Testing, and Validation Set 3 Naive Bayes Classifier Decision

More information

Directed Graphical Models

Directed Graphical Models Directed Graphical Models Instructor: Alan Ritter Many Slides from Tom Mitchell Graphical Models Key Idea: Conditional independence assumptions useful but Naïve Bayes is extreme! Graphical models express

More information

Probabilistic Classification

Probabilistic Classification Bayesian Networks Probabilistic Classification Goal: Gather Labeled Training Data Build/Learn a Probability Model Use the model to infer class labels for unlabeled data points Example: Spam Filtering...

More information

Introduction to Bayesian Learning

Introduction to Bayesian Learning Course Information Introduction Introduction to Bayesian Learning Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Apprendimento Automatico: Fondamenti - A.A. 2016/2017 Outline

More information

Bayesian Inference. Definitions from Probability: Naive Bayes Classifiers: Advantages and Disadvantages of Naive Bayes Classifiers:

Bayesian Inference. Definitions from Probability: Naive Bayes Classifiers: Advantages and Disadvantages of Naive Bayes Classifiers: Bayesian Inference The purpose of this document is to review belief networks and naive Bayes classifiers. Definitions from Probability: Belief networks: Naive Bayes Classifiers: Advantages and Disadvantages

More information

Generative Learning algorithms

Generative Learning algorithms CS9 Lecture notes Andrew Ng Part IV Generative Learning algorithms So far, we ve mainly been talking about learning algorithms that model p(y x; θ), the conditional distribution of y given x. For instance,

More information

The Naïve Bayes Classifier. Machine Learning Fall 2017

The Naïve Bayes Classifier. Machine Learning Fall 2017 The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning

More information

Machine Learning, Fall 2012 Homework 2

Machine Learning, Fall 2012 Homework 2 0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0

More information

Naïve Bayes Classifiers

Naïve Bayes Classifiers Naïve Bayes Classifiers Example: PlayTennis (6.9.1) Given a new instance, e.g. (Outlook = sunny, Temperature = cool, Humidity = high, Wind = strong ), we want to compute the most likely hypothesis: v NB

More information

COMP 328: Machine Learning

COMP 328: Machine Learning COMP 328: Machine Learning Lecture 2: Naive Bayes Classifiers Nevin L. Zhang Department of Computer Science and Engineering The Hong Kong University of Science and Technology Spring 2010 Nevin L. Zhang

More information

Probability Review and Naïve Bayes

Probability Review and Naïve Bayes Probability Review and Naïve Bayes Instructor: Alan Ritter Some slides adapted from Dan Jurfasky and Brendan O connor What is Probability? The probability the coin will land heads is 0.5 Q: what does this

More information

Today. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use?

Today. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use? Today Statistical Learning Parameter Estimation: Maximum Likelihood (ML) Maximum A Posteriori (MAP) Bayesian Continuous case Learning Parameters for a Bayesian Network Naive Bayes Maximum Likelihood estimates

More information

CMPSCI 240: Reasoning about Uncertainty

CMPSCI 240: Reasoning about Uncertainty CMPSCI 240: Reasoning about Uncertainty Lecture 17: Representing Joint PMFs and Bayesian Networks Andrew McGregor University of Massachusetts Last Compiled: April 7, 2017 Warm Up: Joint distributions Recall

More information

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul Generative Learning INFO-4604, Applied Machine Learning University of Colorado Boulder November 29, 2018 Prof. Michael Paul Generative vs Discriminative The classification algorithms we have seen so far

More information

Structure Learning: the good, the bad, the ugly

Structure Learning: the good, the bad, the ugly Readings: K&F: 15.1, 15.2, 15.3, 15.4, 15.5 Structure Learning: the good, the bad, the ugly Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 29 th, 2006 1 Understanding the uniform

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models CPSC

More information

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017 CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class

More information

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine CS 484 Data Mining Classification 7 Some slides are from Professor Padhraic Smyth at UC Irvine Bayesian Belief networks Conditional independence assumption of Naïve Bayes classifier is too strong. Allows

More information

1 Probabilities. 1.1 Basics 1 PROBABILITIES

1 Probabilities. 1.1 Basics 1 PROBABILITIES 1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability

More information

Machine Learning and Data Mining. Bayes Classifiers. Prof. Alexander Ihler

Machine Learning and Data Mining. Bayes Classifiers. Prof. Alexander Ihler + Machine Learning and Data Mining Bayes Classifiers Prof. Alexander Ihler A basic classifier Training data D={x (i),y (i) }, Classifier f(x ; D) Discrete feature vector x f(x ; D) is a con@ngency table

More information

Bayesian Approaches Data Mining Selected Technique

Bayesian Approaches Data Mining Selected Technique Bayesian Approaches Data Mining Selected Technique Henry Xiao xiao@cs.queensu.ca School of Computing Queen s University Henry Xiao CISC 873 Data Mining p. 1/17 Probabilistic Bases Review the fundamentals

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

Machine Learning Algorithm. Heejun Kim

Machine Learning Algorithm. Heejun Kim Machine Learning Algorithm Heejun Kim June 12, 2018 Machine Learning Algorithms Machine Learning algorithm: a procedure in developing computer programs that improve their performance with experience. Types

More information

CMPT Machine Learning. Bayesian Learning Lecture Scribe for Week 4 Jan 30th & Feb 4th

CMPT Machine Learning. Bayesian Learning Lecture Scribe for Week 4 Jan 30th & Feb 4th CMPT 882 - Machine Learning Bayesian Learning Lecture Scribe for Week 4 Jan 30th & Feb 4th Stephen Fagan sfagan@sfu.ca Overview: Introduction - Who was Bayes? - Bayesian Statistics Versus Classical Statistics

More information

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we

More information

Introduction to Artificial Intelligence. Unit # 11

Introduction to Artificial Intelligence. Unit # 11 Introduction to Artificial Intelligence Unit # 11 1 Course Outline Overview of Artificial Intelligence State Space Representation Search Techniques Machine Learning Logic Probabilistic Reasoning/Bayesian

More information

CS 188: Artificial Intelligence Spring Today

CS 188: Artificial Intelligence Spring Today CS 188: Artificial Intelligence Spring 2006 Lecture 9: Naïve Bayes 2/14/2006 Dan Klein UC Berkeley Many slides from either Stuart Russell or Andrew Moore Bayes rule Today Expectations and utilities Naïve

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging

10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging 10-708: Probabilistic Graphical Models 10-708, Spring 2018 10 : HMM and CRF Lecturer: Kayhan Batmanghelich Scribes: Ben Lengerich, Michael Kleyman 1 Case Study: Supervised Part-of-Speech Tagging We will

More information

Behavioral Data Mining. Lecture 2

Behavioral Data Mining. Lecture 2 Behavioral Data Mining Lecture 2 Autonomy Corp Bayes Theorem Bayes Theorem P(A B) = probability of A given that B is true. P(A B) = P(B A)P(A) P(B) In practice we are most interested in dealing with events

More information

Naïve Bayes, Maxent and Neural Models

Naïve Bayes, Maxent and Neural Models Naïve Bayes, Maxent and Neural Models CMSC 473/673 UMBC Some slides adapted from 3SLP Outline Recap: classification (MAP vs. noisy channel) & evaluation Naïve Bayes (NB) classification Terminology: bag-of-words

More information

Examination Artificial Intelligence Module Intelligent Interaction Design December 2014

Examination Artificial Intelligence Module Intelligent Interaction Design December 2014 Examination Artificial Intelligence Module Intelligent Interaction Design December 2014 Introduction This exam is closed book, you may only use a simple calculator (addition, substraction, multiplication

More information

Events A and B are independent P(A) = P(A B) = P(A B) / P(B)

Events A and B are independent P(A) = P(A B) = P(A B) / P(B) Events A and B are independent A B U P(A) = P(A B) = P(A B) / P(B) 1 Alternative Characterization of Independence a) P(A B) = P(A) b) P(A B) = P(A) P(B) Recall P(A B) = P(A B) / P(B) (if P(B) 0) So P(A

More information

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013 Introduction to Bayes Nets CS 486/686: Introduction to Artificial Intelligence Fall 2013 1 Introduction Review probabilistic inference, independence and conditional independence Bayesian Networks - - What

More information

CSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas

CSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas ian ian ian Might have reasons (domain information) to favor some hypotheses/predictions over others a priori ian methods work with probabilities, and have two main roles: Naïve Nets (Adapted from Ethem

More information

Basics of Probability

Basics of Probability Basics of Probability Lecture 1 Doug Downey, Northwestern EECS 474 Events Event space E.g. for dice, = {1, 2, 3, 4, 5, 6} Set of measurable events S 2 E.g., = event we roll an even number = {2, 4, 6} S

More information

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem Recall from last time: Conditional probabilities Our probabilistic models will compute and manipulate conditional probabilities. Given two random variables X, Y, we denote by Lecture 2: Belief (Bayesian)

More information

Stephen Scott.

Stephen Scott. 1 / 28 ian ian Optimal (Adapted from Ethem Alpaydin and Tom Mitchell) Naïve Nets sscott@cse.unl.edu 2 / 28 ian Optimal Naïve Nets Might have reasons (domain information) to favor some hypotheses/predictions

More information

Categorization ANLP Lecture 10 Text Categorization with Naive Bayes

Categorization ANLP Lecture 10 Text Categorization with Naive Bayes 1 Categorization ANLP Lecture 10 Text Categorization with Naive Bayes Sharon Goldwater 6 October 2014 Important task for both humans and machines object identification face recognition spoken word recognition

More information

ANLP Lecture 10 Text Categorization with Naive Bayes

ANLP Lecture 10 Text Categorization with Naive Bayes ANLP Lecture 10 Text Categorization with Naive Bayes Sharon Goldwater 6 October 2014 Categorization Important task for both humans and machines 1 object identification face recognition spoken word recognition

More information

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference COS402- Artificial Intelligence Fall 2015 Lecture 10: Bayesian Networks & Exact Inference Outline Logical inference and probabilistic inference Independence and conditional independence Bayes Nets Semantics

More information

Rapid Introduction to Machine Learning/ Deep Learning

Rapid Introduction to Machine Learning/ Deep Learning Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/32 Lecture 5a Bayesian network April 14, 2016 2/32 Table of contents 1 1. Objectives of Lecture 5a 2 2.Bayesian

More information

Conditional Independence

Conditional Independence Conditional Independence Sargur Srihari srihari@cedar.buffalo.edu 1 Conditional Independence Topics 1. What is Conditional Independence? Factorization of probability distribution into marginals 2. Why

More information

Data Mining Part 4. Prediction

Data Mining Part 4. Prediction Data Mining Part 4. Prediction 4.3. Fall 2009 Instructor: Dr. Masoud Yaghini Outline Introduction Bayes Theorem Naïve References Introduction Bayesian classifiers A statistical classifiers Introduction

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

Bayesian Networks Basic and simple graphs

Bayesian Networks Basic and simple graphs Bayesian Networks Basic and simple graphs Ullrika Sahlin, Centre of Environmental and Climate Research Lund University, Sweden Ullrika.Sahlin@cec.lu.se http://www.cec.lu.se/ullrika-sahlin Bayesian [Belief]

More information

Bayesian Networks Structure Learning (cont.)

Bayesian Networks Structure Learning (cont.) Koller & Friedman Chapters (handed out): Chapter 11 (short) Chapter 1: 1.1, 1., 1.3 (covered in the beginning of semester) 1.4 (Learning parameters for BNs) Chapter 13: 13.1, 13.3.1, 13.4.1, 13.4.3 (basic

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning CS4375 --- Fall 2018 Bayesian a Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell 1 Uncertainty Most real-world problems deal with

More information

CS 188: Artificial Intelligence. Machine Learning

CS 188: Artificial Intelligence. Machine Learning CS 188: Artificial Intelligence Review of Machine Learning (ML) DISCLAIMER: It is insufficient to simply study these slides, they are merely meant as a quick refresher of the high-level ideas covered.

More information

Naïve Bayes Classifiers and Logistic Regression. Doug Downey Northwestern EECS 349 Winter 2014

Naïve Bayes Classifiers and Logistic Regression. Doug Downey Northwestern EECS 349 Winter 2014 Naïve Bayes Classifiers and Logistic Regression Doug Downey Northwestern EECS 349 Winter 2014 Naïve Bayes Classifiers Combines all ideas we ve covered Conditional Independence Bayes Rule Statistical Estimation

More information

Homework 1 Solutions Probability, Maximum Likelihood Estimation (MLE), Bayes Rule, knn

Homework 1 Solutions Probability, Maximum Likelihood Estimation (MLE), Bayes Rule, knn Homework 1 Solutions Probability, Maximum Likelihood Estimation (MLE), Bayes Rule, knn CMU 10-701: Machine Learning (Fall 2016) https://piazza.com/class/is95mzbrvpn63d OUT: September 13th DUE: September

More information

Directed and Undirected Graphical Models

Directed and Undirected Graphical Models Directed and Undirected Graphical Models Adrian Weller MLSALT4 Lecture Feb 26, 2016 With thanks to David Sontag (NYU) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,

More information

CSE446: Naïve Bayes Winter 2016

CSE446: Naïve Bayes Winter 2016 CSE446: Naïve Bayes Winter 2016 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, Luke ZeIlemoyer Supervised Learning: find f Given: Training set {(x i, y i ) i = 1 n} Find: A good approximanon

More information

Maxent Models and Discriminative Estimation

Maxent Models and Discriminative Estimation Maxent Models and Discriminative Estimation Generative vs. Discriminative models (Reading: J+M Ch6) Introduction So far we ve looked at generative models Language models, Naive Bayes But there is now much

More information

Machine Learning. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 8 May 2012

Machine Learning. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 8 May 2012 Machine Learning Hal Daumé III Computer Science University of Maryland me@hal3.name CS 421 Introduction to Artificial Intelligence 8 May 2012 g 1 Many slides courtesy of Dan Klein, Stuart Russell, or Andrew

More information

Introduction to Machine Learning

Introduction to Machine Learning Uncertainty Introduction to Machine Learning CS4375 --- Fall 2018 a Bayesian Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell Most real-world problems deal with

More information

Generative Classifiers: Part 1. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang

Generative Classifiers: Part 1. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang Generative Classifiers: Part 1 CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang 1 This Week Discriminative vs Generative Models Simple Model: Does the patient

More information

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution NETWORK ANALYSIS Lourens Waldorp PROBABILITY AND GRAPHS The objective is to obtain a correspondence between the intuitive pictures (graphs) of variables of interest and the probability distributions of

More information

Pattern Recognition and Machine Learning. Learning and Evaluation of Pattern Recognition Processes

Pattern Recognition and Machine Learning. Learning and Evaluation of Pattern Recognition Processes Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lesson 1 5 October 2016 Learning and Evaluation of Pattern Recognition Processes Outline Notation...2 1. The

More information

Machine learning: lecture 20. Tommi S. Jaakkola MIT CSAIL

Machine learning: lecture 20. Tommi S. Jaakkola MIT CSAIL Machine learning: lecture 20 ommi. Jaakkola MI CAI tommi@csail.mit.edu opics Representation and graphical models examples Bayesian networks examples, specification graphs and independence associated distribution

More information

Machine Learning, Fall 2009: Midterm

Machine Learning, Fall 2009: Midterm 10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all

More information

Bayesian Networks. CompSci 270 Duke University Ron Parr. Why Joint Distributions are Important

Bayesian Networks. CompSci 270 Duke University Ron Parr. Why Joint Distributions are Important Bayesian Networks CompSci 270 Duke University Ron Parr Why Joint Distributions are Important Joint distributions gives P(X 1 X n ) Classification/Diagnosis Suppose X1=disease X2 Xn = symptoms Co-occurrence

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

1 Probabilities. 1.1 Basics 1 PROBABILITIES

1 Probabilities. 1.1 Basics 1 PROBABILITIES 1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability

More information

More Smoothing, Tuning, and Evaluation

More Smoothing, Tuning, and Evaluation More Smoothing, Tuning, and Evaluation Nathan Schneider (slides adapted from Henry Thompson, Alex Lascarides, Chris Dyer, Noah Smith, et al.) ENLP 21 September 2016 1 Review: 2 Naïve Bayes Classifier w

More information

Introduction to AI Learning Bayesian networks. Vibhav Gogate

Introduction to AI Learning Bayesian networks. Vibhav Gogate Introduction to AI Learning Bayesian networks Vibhav Gogate Inductive Learning in a nutshell Given: Data Examples of a function (X, F(X)) Predict function F(X) for new examples X Discrete F(X): Classification

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

Dimensionality reduction

Dimensionality reduction Dimensionality Reduction PCA continued Machine Learning CSE446 Carlos Guestrin University of Washington May 22, 2013 Carlos Guestrin 2005-2013 1 Dimensionality reduction n Input data may have thousands

More information

Classification and Regression Trees

Classification and Regression Trees Classification and Regression Trees Ryan P Adams So far, we have primarily examined linear classifiers and regressors, and considered several different ways to train them When we ve found the linearity

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2016/2017 Lesson 13 24 march 2017 Reasoning with Bayesian Networks Naïve Bayesian Systems...2 Example

More information

Classification: Naïve Bayes. Nathan Schneider (slides adapted from Chris Dyer, Noah Smith, et al.) ENLP 19 September 2016

Classification: Naïve Bayes. Nathan Schneider (slides adapted from Chris Dyer, Noah Smith, et al.) ENLP 19 September 2016 Classification: Naïve Bayes Nathan Schneider (slides adapted from Chris Dyer, Noah Smith, et al.) ENLP 19 September 2016 1 Sentiment Analysis Recall the task: Filled with horrific dialogue, laughable characters,

More information

Learning P-maps Param. Learning

Learning P-maps Param. Learning Readings: K&F: 3.3, 3.4, 16.1, 16.2, 16.3, 16.4 Learning P-maps Param. Learning Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 24 th, 2008 10-708 Carlos Guestrin 2006-2008

More information

CS 188: Artificial Intelligence Fall 2008

CS 188: Artificial Intelligence Fall 2008 CS 188: Artificial Intelligence Fall 2008 Lecture 23: Perceptrons 11/20/2008 Dan Klein UC Berkeley 1 General Naïve Bayes A general naive Bayes model: C E 1 E 2 E n We only specify how each feature depends

More information

General Naïve Bayes. CS 188: Artificial Intelligence Fall Example: Overfitting. Example: OCR. Example: Spam Filtering. Example: Spam Filtering

General Naïve Bayes. CS 188: Artificial Intelligence Fall Example: Overfitting. Example: OCR. Example: Spam Filtering. Example: Spam Filtering CS 188: Artificial Intelligence Fall 2008 General Naïve Bayes A general naive Bayes model: C Lecture 23: Perceptrons 11/20/2008 E 1 E 2 E n Dan Klein UC Berkeley We only specify how each feature depends

More information

CS446: Machine Learning Fall Final Exam. December 6 th, 2016

CS446: Machine Learning Fall Final Exam. December 6 th, 2016 CS446: Machine Learning Fall 2016 Final Exam December 6 th, 2016 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains

More information

CHAPTER-17. Decision Tree Induction

CHAPTER-17. Decision Tree Induction CHAPTER-17 Decision Tree Induction 17.1 Introduction 17.2 Attribute selection measure 17.3 Tree Pruning 17.4 Extracting Classification Rules from Decision Trees 17.5 Bayesian Classification 17.6 Bayes

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released

More information

CS Lecture 3. More Bayesian Networks

CS Lecture 3. More Bayesian Networks CS 6347 Lecture 3 More Bayesian Networks Recap Last time: Complexity challenges Representing distributions Computing probabilities/doing inference Introduction to Bayesian networks Today: D-separation,

More information

COS 424: Interacting with Data. Lecturer: Dave Blei Lecture #11 Scribe: Andrew Ferguson March 13, 2007

COS 424: Interacting with Data. Lecturer: Dave Blei Lecture #11 Scribe: Andrew Ferguson March 13, 2007 COS 424: Interacting with ata Lecturer: ave Blei Lecture #11 Scribe: Andrew Ferguson March 13, 2007 1 Graphical Models Wrap-up We began the lecture with some final words on graphical models. Choosing a

More information

Lecture 6: Graphical Models

Lecture 6: Graphical Models Lecture 6: Graphical Models Kai-Wei Chang CS @ Uniersity of Virginia kw@kwchang.net Some slides are adapted from Viek Skirmar s course on Structured Prediction 1 So far We discussed sequence labeling tasks:

More information

Bayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III

Bayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III Bayesian Networks Announcements: Drop deadline is this Sunday Nov 5 th. All lecture notes needed for T3 posted (L13,,L17). T3 sample

More information

Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur Lecture 21 K - Nearest Neighbor V In this lecture we discuss; how do we evaluate the

More information