# Learning Bayesian belief networks

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Lecture 4 Learning Bayesian belief networks Milos Hauskrecht 5329 Sennott Square Administration Midterm: Monday, March 7, 2003 In class Closed book Material covered by Wednesday, March 2, including learning parameters of the BBNs but not the structure learning Last year midterm is posted on the web No new homework

2 Learning probability distribution Basic settings: A set of random variables = {, 2, K, n} A model of the distribution over variables in with parameters Θ Data D = D, D,.., D } { 2 N Obective: find parameters Θˆ that describe the data the best Learning Bayesian belief networks: parameterizations as defined by the structure of network Learning of BBN Learning. Learning of parameters of conditional probabilities Learning of the network structure Variables: Observable values present in every data sample Hidden they values are never observed in data Missing values values sometimes present, sometimes not Next: All variables are observable. Learning of parameters of BBN 2. Learning of the model (BBN structure

3 Learning of parameters of BBN Idea: decompose the estimation problem for the full oint over a large number of variables to a set of smaller estimation problems corresponding to parent-variable conditionals. Example: Assume A,E,B are binary with rue, alse values B A 4 estimation problems E A B=,E= A B=,E= A B,E A B=,E= A B=,E= Assumption that enables the decomposition: parameters of conditional distributions are independent Estimates of parameters of BBN wo assumptions that permit the decomposition: Sample independence D Θ, ξ = Θ, ξ u= Parameter independence N D u n i Θ D, ξ = θ D, ξ q i= = Parameters of each conditional (one for every assignment of values to parent variables can be learned independently

4 Learning of BBN parameters. Example. Example:?? HWBC Pneum Pn???? ever Palen Pneum ever Pneum Pneum??? Learning of BBN parameters. Example. Data D (different patient cases: Pal ev Cou HWB Pneu ever

5 Estimates of parameters of BBN Much like multiple coin toss or roll of a dice problems. A smaller learning problem corresponds to the learning of exactly one conditional distribution Example: ever = Problem: How to pick the data to learn? Estimates of parameters of BBN Much like multiple coin toss or roll of a dice problems. A smaller learning problem corresponds to the learning of exactly one conditional distribution Example: ever = Problem: How to pick the data to learn? Answer:. Select data points with = (ignore the rest 2. ocus on (select only values of the random variable defining the distribution (ever 3. Learn the parameters of the conditional the same way as we learned the parameters of the biased coin or dice

6 Learning of BBN parameters. Example. Learn: ever = Step : Select data points with = Pal ev Cou HWB Pneu ever Learning of BBN parameters. Example. Learn: Step : ever = Ignore the rest Pal ev Cou HWB Pneu ever

7 Learning of BBN parameters. Example. Learn: ever = Step 2: Select values of the random variable defining the distribution of ever Pal ev Cou HWB Pneu ever Learning of BBN parameters. Example. Learn: ever = Step 2: Ignore the rest ev ever

8 Learning of BBN parameters. Example. Learn: ever = Step 3a: Learning the ML estimate ev ever ever = Learning of BBN parameters. Bayesian learning. Learn: ever = Step 3b: Learning the Bayesian estimate Assume the prior ev θ ever = ~ Beta(3,4 ever Posterior: ~ Beta(6,6 θ ever =

9 Naïve Bayes model A special (simple Bayesian belief network used as a generative classifier model Class variable Y Attributes are independent given Y x Y = i, Θ = x Learning: ML, Bayesian estimates of parameters Classification: given x we need to determine the class Choose the class with the maximum posterior Y = i x, Θ = k n = = Y = i, Θ Y = i Θ x Y = i, Θ Y = Θ x Y =, Θ Class Y 2.. n Naïve Bayes with Gaussians distributions Generative classification model, Y. Priors on classes Y p ( Y =, p ( Y = 2, p ( Y = 3,... Y Before: Joint class conditional densities (for x p ( x µ, Σ = exp ( x µ ( / 2 / 2 Σ x µ d 2 (2π Σ p (Y Now: Naïve Bayes - independent class conditional densities Y 2 xi µ i, σ i = exp ( x 2 i µ i Y (2π σ i 2σ i Y 2.. n

10 Naïve Bayes with Gaussians distributions How to learn the generative model, Y. Priors on classes p ( Y =, p ( Y = 2, p ( Y = 3,...? 2. Class conditional densities 2 xi µ i, σ i = exp ( x 2 i µ i (2π σ i 2σ i Y 2.. Y n Model selection BBN has two components: Structure of the network (models conditional independences A set of parameters (conditional child-parent distributions We already know how to learn the parameters for the fixed structure But how to learn the structure of the BBN? Assumption: All variables are observable in the dataset

11 Learning the structure Criteria we can choose to score the structure S Marginal likelihood maximize P ( D S, ξ ξ - represents the prior knowledge Posterior probability maximize P ( S D, ξ P ( S D, ξ = P ( D S, ξ P ( S P ( D ξ ξ How to compute marginal likelihood P ( D S, ξ? Learning of BBNs Notation: i ranges over all possible variables i=,..,n =,..,q ranges over all possible parent combinations k=,..,r ranges over all possible variable values Θ - parameters of the BBN θ is a vector of θ k representing parameters of the conditional r probability distribution; such that θk = Nk N k = k = - a number of instances in the dataset where parents of variable i take on values and i has value k r = N k = r k = k - prior counts (parameters of Beta and Dirichlet priors k

12 Marginal likelihood Integrate over all possible parameter settings Θ P ( D S, ξ = D S, Θ, ξ Θ S, ξ dθ Using the assumption of parameter and sample independence P ( D S, ξ = n qi r i k k i = = Γ( + N k = Γ( k Γ( We can use log-likelihood score instead log D S, ξ = Γ( + N n qi r Γ( i Γ( k + N k log + log = = Γ( + N = Γ( i Score is decomposable along variables!!! k k rick to compute the marginal likelihood Integrate over all possible parameter settings Posterior of parameters, given data and the structure Gives the solution Θ P ( D ξ = D Θ, ξ Θ ξ dθ D Θ, ξ Θ ξ Θ D, ξ = D ξ rick D Θ, ξ Θ ξ D ξ = Θ D, ξ D ξ = n qi r i k k i= = Γ( + N k = Γ( k Γ( Γ( + N

13 Learning the structure Likelihood of data for the BBN (structure and parameters D Θ, ξ measures the goodness of fit of the BBN to data Marginal likelihood (for the structure only P ( D S, ξ Does not measure only a goodness of fit. It is: different for structures of different complexity Incorporates preferences towards simpler structures, implements Occam s razor!!!! Occam s Razor Why there is a preference towards simpler structures? Rewrite marginal likelihood as D S, ξ = We know that Θ Θ D S, Θ, ξ Θ S, ξ dθ Θ Θ S, ξ dθ Θ ξ dθ = Interpretation: in more complex structures there are more ways how parameters can be set badly he numerator: count of good assignments he denominator: count of all assignments

14 Approximations of probabilistic scores Approximations of the marginal likelihood and posterior scores Information based measures Akaike criterion Bayesian information criterion (BIC Minimum description length (MDL Reflect the tradeoff between the fit to data and preference towards simpler structures Example: Akaike criterion. Maximize: score( S = log D ΘML, ξ compl(s Bayesian information criterion (BIC Maximize: score( S = log D ΘML, ξ compl(s logn 2 Optimizing the structure inding the best structure is a combinatorial optimization problem A good feature: the score is decomposable along variables: n q Γ Γ + = i r ( + i ( N k k log P ( D S, ξ log log i = = Γ ( + N k = Γ ( k Algorithm idea: Search the space of structures using local changes (additions and deletions of a link Advantage: we do not have to compute the whole score from scratch Recompute the partial score for the affected variable

15 Optimizing the structure. Algorithms Greedy search Start from structure with no links Add a link that yields the best score improvement Metropolis algorithm (with simulated annealing Local additions and deletions Avoids being trapped in local optimal

### Learning Bayesian networks

1 Lecture topics: Learning Bayesian networks from data maximum likelihood, BIC Bayesian, marginal likelihood Learning Bayesian networks There are two problems we have to solve in order to estimate Bayesian

### Bayesian belief networks. Inference.

Lecture 13 Bayesian belief networks. Inference. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Midterm exam Monday, March 17, 2003 In class Closed book Material covered by Wednesday, March 12 Last

### Naïve Bayes classification

Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

### Bayesian Methods: Naïve Bayes

Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

### Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

### Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

### Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

### Bayesian Learning (II)

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

### Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

### MLE/MAP + Naïve Bayes

10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes Matt Gormley Lecture 19 March 20, 2018 1 Midterm Exam Reminders

### CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES Chapter 8&9: Classification: Part 3 Instructor: Yizhou Sun yzsun@ccs.neu.edu March 12, 2013 Midterm Report Grade Distribution 90-100 10 80-89 16 70-79 8 60-69 4

### Generative Models for Discrete Data

Generative Models for Discrete Data ddebarr@uw.edu 2016-04-21 Agenda Bayesian Concept Learning Beta-Binomial Model Dirichlet-Multinomial Model Naïve Bayes Classifiers Bayesian Concept Learning Numbers

### Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

### σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

### 4: Parameter Estimation in Fully Observed BNs

10-708: Probabilistic Graphical Models 10-708, Spring 2015 4: Parameter Estimation in Fully Observed BNs Lecturer: Eric P. Xing Scribes: Purvasha Charavarti, Natalie Klein, Dipan Pal 1 Learning Graphical

### Unsupervised Learning

Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

### The Naïve Bayes Classifier. Machine Learning Fall 2017

The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning

### Statistical learning. Chapter 20, Sections 1 3 1

Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

### Probabilistic Graphical Models

Parameter Estimation December 14, 2015 Overview 1 Motivation 2 3 4 What did we have so far? 1 Representations: how do we model the problem? (directed/undirected). 2 Inference: given a model and partially

### Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

### Probability Based Learning

Probability Based Learning Lecture 7, DD2431 Machine Learning J. Sullivan, A. Maki September 2013 Advantages of Probability Based Methods Work with sparse training data. More powerful than deterministic

### Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

15-0: Learning vs. Deduction Artificial Intelligence Programming Bayesian Learning Chris Brooks Department of Computer Science University of San Francisco So far, we ve seen two types of reasoning: Deductive

### Introduction to Machine Learning Midterm Exam Solutions

10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

### UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

### Bayesian Machine Learning

Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

### Machine Learning 4771

Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood

### An Empirical-Bayes Score for Discrete Bayesian Networks

An Empirical-Bayes Score for Discrete Bayesian Networks scutari@stats.ox.ac.uk Department of Statistics September 8, 2016 Bayesian Network Structure Learning Learning a BN B = (G, Θ) from a data set D

### CPSC 540: Machine Learning

CPSC 540: Machine Learning Empirical Bayes, Hierarchical Bayes Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 5: Due April 10. Project description on Piazza. Final details coming

### Machine Learning, Fall 2012 Homework 2

0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0

### Scoring functions for learning Bayesian networks

Tópicos Avançados p. 1/48 Scoring functions for learning Bayesian networks Alexandra M. Carvalho Tópicos Avançados p. 2/48 Plan Learning Bayesian networks Scoring functions for learning Bayesian networks:

### A Brief Review of Probability, Bayesian Statistics, and Information Theory

A Brief Review of Probability, Bayesian Statistics, and Information Theory Brendan Frey Electrical and Computer Engineering University of Toronto frey@psi.toronto.edu http://www.psi.toronto.edu A system

### Introduc)on to Bayesian methods (con)nued) - Lecture 16

Introduc)on to Bayesian methods (con)nued) - Lecture 16 David Sontag New York University Slides adapted from Luke Zettlemoyer, Carlos Guestrin, Dan Klein, and Vibhav Gogate Outline of lectures Review of

### Statistical learning. Chapter 20, Sections 1 4 1

Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

### CS 6140: Machine Learning Spring 2016

CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa?on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Logis?cs Assignment

### Learning Bayesian Networks for Biomedical Data

Learning Bayesian Networks for Biomedical Data Faming Liang (Texas A&M University ) Liang, F. and Zhang, J. (2009) Learning Bayesian Networks for Discrete Data. Computational Statistics and Data Analysis,

### Learning With Bayesian Networks. Markus Kalisch ETH Zürich

Learning With Bayesian Networks Markus Kalisch ETH Zürich Inference in BNs - Review P(Burglary JohnCalls=TRUE, MaryCalls=TRUE) Exact Inference: P(b j,m) = c Sum e Sum a P(b)P(e)P(a b,e)p(j a)p(m a) Deal

### CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning

CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we

### GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

### Statistical learning. Chapter 20, Sections 1 3 1

Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

### Lecture 2: Simple Classifiers

CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 2: Simple Classifiers Slides based on Rich Zemel s All lecture slides will be available on the course website: www.cs.toronto.edu/~jessebett/csc412

### Bayes Rule. CS789: Machine Learning and Neural Network Bayesian learning. A Side Note on Probability. What will we learn in this lecture?

Bayes Rule CS789: Machine Learning and Neural Network Bayesian learning P (Y X) = P (X Y )P (Y ) P (X) Jakramate Bootkrajang Department of Computer Science Chiang Mai University P (Y ): prior belief, prior

### Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar

### 6.867 Machine learning, lecture 23 (Jaakkola)

Lecture topics: Markov Random Fields Probabilistic inference Markov Random Fields We will briefly go over undirected graphical models or Markov Random Fields (MRFs) as they will be needed in the context

### Bayesian Networks Structure Learning (cont.)

Koller & Friedman Chapters (handed out): Chapter 11 (short) Chapter 1: 1.1, 1., 1.3 (covered in the beginning of semester) 1.4 (Learning parameters for BNs) Chapter 13: 13.1, 13.3.1, 13.4.1, 13.4.3 (basic

### Parametric Techniques

Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

### Overview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58

Overview of Course So far, we have studied The concept of Bayesian network Independence and Separation in Bayesian networks Inference in Bayesian networks The rest of the course: Data analysis using Bayesian

### CPSC 540: Machine Learning

CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

### Learning Terminological Naïve Bayesian Classifiers Under Different Assumptions on Missing Knowledge

Learning Terminological Naïve Bayesian Classifiers Under Different Assumptions on Missing Knowledge Pasquale Minervini Claudia d Amato Nicola Fanizzi Department of Computer Science University of Bari URSW

### Language as a Stochastic Process

CS769 Spring 2010 Advanced Natural Language Processing Language as a Stochastic Process Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Basic Statistics for NLP Pick an arbitrary letter x at random from any

### Learning in Bayesian Networks

Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks

### Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly

### Probabilistic Reasoning

Course 16 :198 :520 : Introduction To Artificial Intelligence Lecture 7 Probabilistic Reasoning Abdeslam Boularias Monday, September 28, 2015 1 / 17 Outline We show how to reason and act under uncertainty.

### PAC-learning, VC Dimension and Margin-based Bounds

More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based

### Soft Computing. Lecture Notes on Machine Learning. Matteo Matteucci.

Soft Computing Lecture Notes on Machine Learning Matteo Matteucci matteucci@elet.polimi.it Department of Electronics and Information Politecnico di Milano Matteo Matteucci c Lecture Notes on Machine Learning

### A brief introduction to Conditional Random Fields

A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood

### Introduction to Bayesian Learning

Course Information Introduction Introduction to Bayesian Learning Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Apprendimento Automatico: Fondamenti - A.A. 2016/2017 Outline

### p L yi z n m x N n xi

y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen

### Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

### A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007

Decision Trees, cont. Boosting Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 1 st, 2007 1 A Decision Stump 2 1 The final tree 3 Basic Decision Tree Building Summarized

### Density Estimation. Seungjin Choi

Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

### Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

### The Bayes classifier

The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

### CS446: Machine Learning Fall Final Exam. December 6 th, 2016

CS446: Machine Learning Fall 2016 Final Exam December 6 th, 2016 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains

### MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 8: Latent Variables, EM Yann LeCun

Y. LeCun: Machine Learning and Pattern Recognition p. 1/? MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 8: Latent Variables, EM Yann LeCun The Courant Institute, New York University http://yann.lecun.com

### Introduction to Graphical Models

Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic

### FEEG6017 lecture: Akaike's information criterion; model reduction. Brendan Neville

FEEG6017 lecture: Akaike's information criterion; model reduction Brendan Neville bjn1c13@ecs.soton.ac.uk Occam's razor William of Occam, 1288-1348. All else being equal, the simplest explanation is the

### Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation

Machine Learning CMPT 726 Simon Fraser University Binomial Parameter Estimation Outline Maximum Likelihood Estimation Smoothed Frequencies, Laplace Correction. Bayesian Approach. Conjugate Prior. Uniform

### MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE March 28, 2012 The exam is closed book. You are allowed a double sided one page cheat sheet. Answer the questions in the spaces provided on

### CMPSCI 240: Reasoning about Uncertainty

CMPSCI 240: Reasoning about Uncertainty Lecture 17: Representing Joint PMFs and Bayesian Networks Andrew McGregor University of Massachusetts Last Compiled: April 7, 2017 Warm Up: Joint distributions Recall

### Expectation-Maximization (EM) algorithm

I529: Machine Learning in Bioinformatics (Spring 2017) Expectation-Maximization (EM) algorithm Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2017 Contents Introduce

### A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I. kevin small & byron wallace

A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I kevin small & byron wallace today a review of probability random variables, maximum likelihood, etc. crucial for clinical

### Approximate Inference using MCMC

Approximate Inference using MCMC 9.520 Class 22 Ruslan Salakhutdinov BCS and CSAIL, MIT 1 Plan 1. Introduction/Notation. 2. Examples of successful Bayesian models. 3. Basic Sampling Algorithms. 4. Markov

### Machine learning: lecture 20. Tommi S. Jaakkola MIT CSAIL

Machine learning: lecture 20 ommi. Jaakkola MI CAI tommi@csail.mit.edu opics Representation and graphical models examples Bayesian networks examples, specification graphs and independence associated distribution

### Introduction to Probabilistic Reasoning. Image credit: NASA. Assignment

Introduction to Probabilistic Reasoning Brian C. Williams 16.410/16.413 November 17 th, 2010 11/17/10 copyright Brian Williams, 2005-10 1 Brian C. Williams, copyright 2000-09 Image credit: NASA. Assignment

### Introduction to Probabilistic Graphical Models

Introduction to Probabilistic Graphical Models Kyu-Baek Hwang and Byoung-Tak Zhang Biointelligence Lab School of Computer Science and Engineering Seoul National University Seoul 151-742 Korea E-mail: kbhwang@bi.snu.ac.kr

### Machine Learning: Probability Theory

Machine Learning: Probability Theory Prof. Dr. Martin Riedmiller Albert-Ludwigs-University Freiburg AG Maschinelles Lernen Theories p.1/28 Probabilities probabilistic statements subsume different effects

### Model comparison and selection

BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

### Introduction to Probability for Graphical Models

Introduction to Probability for Grahical Models CSC 4 Kaustav Kundu Thursday January 4, 06 *Most slides based on Kevin Swersky s slides, Inmar Givoni s slides, Danny Tarlow s slides, Jaser Snoek s slides,

### BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1

BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013) ISLR Data Mining I Lecture

### Outline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination

Probabilistic Graphical Models COMP 790-90 Seminar Spring 2011 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Outline It Introduction ti Representation Bayesian network Conditional Independence Inference:

### Based on slides by Richard Zemel

CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

### Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

### Some Probability and Statistics

Some Probability and Statistics David M. Blei COS424 Princeton University February 13, 2012 Card problem There are three cards Red/Red Red/Black Black/Black I go through the following process. Close my

### Text Categorization CSE 454. (Based on slides by Dan Weld, Tom Mitchell, and others)

Text Categorization CSE 454 (Based on slides by Dan Weld, Tom Mitchell, and others) 1 Given: Categorization A description of an instance, x X, where X is the instance language or instance space. A fixed

### International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems c World Scientific Publishing Company

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems c World Scientific Publishing Company UNSUPERVISED LEARNING OF BAYESIAN NETWORKS VIA ESTIMATION OF DISTRIBUTION ALGORITHMS: AN

### Naïve Bayes. Vibhav Gogate The University of Texas at Dallas

Naïve Bayes Vibhav Gogate The University of Texas at Dallas Supervised Learning of Classifiers Find f Given: Training set {(x i, y i ) i = 1 n} Find: A good approximation to f : X Y Examples: what are

### Regression I: Mean Squared Error and Measuring Quality of Fit

Regression I: Mean Squared Error and Measuring Quality of Fit -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 The Setup Suppose there is a scientific problem we are interested in solving

### Least Squares Regression

E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

### Notes on Markov Networks

Notes on Markov Networks Lili Mou moull12@sei.pku.edu.cn December, 2014 This note covers basic topics in Markov networks. We mainly talk about the formal definition, Gibbs sampling for inference, and maximum

### Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014

Machine Learning Probability Basics Basic definitions: Random variables, joint, conditional, marginal distribution, Bayes theorem & examples; Probability distributions: Binomial, Beta, Multinomial, Dirichlet,

### Machine Learning

Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 13, 2011 Today: The Big Picture Overfitting Review: probability Readings: Decision trees, overfiting

### Probabilistic Graphical Models

Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

### ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The

### Introduction to Gaussian Process

Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression

### Probability Theory. Introduction to Probability Theory. Principles of Counting Examples. Principles of Counting. Probability spaces.

Probability Theory To start out the course, we need to know something about statistics and probability Introduction to Probability Theory L645 Advanced NLP Autumn 2009 This is only an introduction; for

### Machine Learning, Midterm Exam

10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have

### Classification: The rest of the story

U NIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN CS598 Machine Learning for Signal Processing Classification: The rest of the story 3 October 2017 Today s lecture Important things we haven t covered yet Fisher

### PHASES OF STATISTICAL ANALYSIS 1. Initial Data Manipulation Assembling data Checks of data quality - graphical and numeric

PHASES OF STATISTICAL ANALYSIS 1. Initial Data Manipulation Assembling data Checks of data quality - graphical and numeric 2. Preliminary Analysis: Clarify Directions for Analysis Identifying Data Structure: