Learning Bayesian belief networks

 Beverly Shanna White
 5 months ago
 Views:
Transcription
1 Lecture 4 Learning Bayesian belief networks Milos Hauskrecht 5329 Sennott Square Administration Midterm: Monday, March 7, 2003 In class Closed book Material covered by Wednesday, March 2, including learning parameters of the BBNs but not the structure learning Last year midterm is posted on the web No new homework
2 Learning probability distribution Basic settings: A set of random variables = {, 2, K, n} A model of the distribution over variables in with parameters Θ Data D = D, D,.., D } { 2 N Obective: find parameters Θˆ that describe the data the best Learning Bayesian belief networks: parameterizations as defined by the structure of network Learning of BBN Learning. Learning of parameters of conditional probabilities Learning of the network structure Variables: Observable values present in every data sample Hidden they values are never observed in data Missing values values sometimes present, sometimes not Next: All variables are observable. Learning of parameters of BBN 2. Learning of the model (BBN structure
3 Learning of parameters of BBN Idea: decompose the estimation problem for the full oint over a large number of variables to a set of smaller estimation problems corresponding to parentvariable conditionals. Example: Assume A,E,B are binary with rue, alse values B A 4 estimation problems E A B=,E= A B=,E= A B,E A B=,E= A B=,E= Assumption that enables the decomposition: parameters of conditional distributions are independent Estimates of parameters of BBN wo assumptions that permit the decomposition: Sample independence D Θ, ξ = Θ, ξ u= Parameter independence N D u n i Θ D, ξ = θ D, ξ q i= = Parameters of each conditional (one for every assignment of values to parent variables can be learned independently
4 Learning of BBN parameters. Example. Example:?? HWBC Pneum Pn???? ever Palen Pneum ever Pneum Pneum??? Learning of BBN parameters. Example. Data D (different patient cases: Pal ev Cou HWB Pneu ever
5 Estimates of parameters of BBN Much like multiple coin toss or roll of a dice problems. A smaller learning problem corresponds to the learning of exactly one conditional distribution Example: ever = Problem: How to pick the data to learn? Estimates of parameters of BBN Much like multiple coin toss or roll of a dice problems. A smaller learning problem corresponds to the learning of exactly one conditional distribution Example: ever = Problem: How to pick the data to learn? Answer:. Select data points with = (ignore the rest 2. ocus on (select only values of the random variable defining the distribution (ever 3. Learn the parameters of the conditional the same way as we learned the parameters of the biased coin or dice
6 Learning of BBN parameters. Example. Learn: ever = Step : Select data points with = Pal ev Cou HWB Pneu ever Learning of BBN parameters. Example. Learn: Step : ever = Ignore the rest Pal ev Cou HWB Pneu ever
7 Learning of BBN parameters. Example. Learn: ever = Step 2: Select values of the random variable defining the distribution of ever Pal ev Cou HWB Pneu ever Learning of BBN parameters. Example. Learn: ever = Step 2: Ignore the rest ev ever
8 Learning of BBN parameters. Example. Learn: ever = Step 3a: Learning the ML estimate ev ever ever = Learning of BBN parameters. Bayesian learning. Learn: ever = Step 3b: Learning the Bayesian estimate Assume the prior ev θ ever = ~ Beta(3,4 ever Posterior: ~ Beta(6,6 θ ever =
9 Naïve Bayes model A special (simple Bayesian belief network used as a generative classifier model Class variable Y Attributes are independent given Y x Y = i, Θ = x Learning: ML, Bayesian estimates of parameters Classification: given x we need to determine the class Choose the class with the maximum posterior Y = i x, Θ = k n = = Y = i, Θ Y = i Θ x Y = i, Θ Y = Θ x Y =, Θ Class Y 2.. n Naïve Bayes with Gaussians distributions Generative classification model, Y. Priors on classes Y p ( Y =, p ( Y = 2, p ( Y = 3,... Y Before: Joint class conditional densities (for x p ( x µ, Σ = exp ( x µ ( / 2 / 2 Σ x µ d 2 (2π Σ p (Y Now: Naïve Bayes  independent class conditional densities Y 2 xi µ i, σ i = exp ( x 2 i µ i Y (2π σ i 2σ i Y 2.. n
10 Naïve Bayes with Gaussians distributions How to learn the generative model, Y. Priors on classes p ( Y =, p ( Y = 2, p ( Y = 3,...? 2. Class conditional densities 2 xi µ i, σ i = exp ( x 2 i µ i (2π σ i 2σ i Y 2.. Y n Model selection BBN has two components: Structure of the network (models conditional independences A set of parameters (conditional childparent distributions We already know how to learn the parameters for the fixed structure But how to learn the structure of the BBN? Assumption: All variables are observable in the dataset
11 Learning the structure Criteria we can choose to score the structure S Marginal likelihood maximize P ( D S, ξ ξ  represents the prior knowledge Posterior probability maximize P ( S D, ξ P ( S D, ξ = P ( D S, ξ P ( S P ( D ξ ξ How to compute marginal likelihood P ( D S, ξ? Learning of BBNs Notation: i ranges over all possible variables i=,..,n =,..,q ranges over all possible parent combinations k=,..,r ranges over all possible variable values Θ  parameters of the BBN θ is a vector of θ k representing parameters of the conditional r probability distribution; such that θk = Nk N k = k =  a number of instances in the dataset where parents of variable i take on values and i has value k r = N k = r k = k  prior counts (parameters of Beta and Dirichlet priors k
12 Marginal likelihood Integrate over all possible parameter settings Θ P ( D S, ξ = D S, Θ, ξ Θ S, ξ dθ Using the assumption of parameter and sample independence P ( D S, ξ = n qi r i k k i = = Γ( + N k = Γ( k Γ( We can use loglikelihood score instead log D S, ξ = Γ( + N n qi r Γ( i Γ( k + N k log + log = = Γ( + N = Γ( i Score is decomposable along variables!!! k k rick to compute the marginal likelihood Integrate over all possible parameter settings Posterior of parameters, given data and the structure Gives the solution Θ P ( D ξ = D Θ, ξ Θ ξ dθ D Θ, ξ Θ ξ Θ D, ξ = D ξ rick D Θ, ξ Θ ξ D ξ = Θ D, ξ D ξ = n qi r i k k i= = Γ( + N k = Γ( k Γ( Γ( + N
13 Learning the structure Likelihood of data for the BBN (structure and parameters D Θ, ξ measures the goodness of fit of the BBN to data Marginal likelihood (for the structure only P ( D S, ξ Does not measure only a goodness of fit. It is: different for structures of different complexity Incorporates preferences towards simpler structures, implements Occam s razor!!!! Occam s Razor Why there is a preference towards simpler structures? Rewrite marginal likelihood as D S, ξ = We know that Θ Θ D S, Θ, ξ Θ S, ξ dθ Θ Θ S, ξ dθ Θ ξ dθ = Interpretation: in more complex structures there are more ways how parameters can be set badly he numerator: count of good assignments he denominator: count of all assignments
14 Approximations of probabilistic scores Approximations of the marginal likelihood and posterior scores Information based measures Akaike criterion Bayesian information criterion (BIC Minimum description length (MDL Reflect the tradeoff between the fit to data and preference towards simpler structures Example: Akaike criterion. Maximize: score( S = log D ΘML, ξ compl(s Bayesian information criterion (BIC Maximize: score( S = log D ΘML, ξ compl(s logn 2 Optimizing the structure inding the best structure is a combinatorial optimization problem A good feature: the score is decomposable along variables: n q Γ Γ + = i r ( + i ( N k k log P ( D S, ξ log log i = = Γ ( + N k = Γ ( k Algorithm idea: Search the space of structures using local changes (additions and deletions of a link Advantage: we do not have to compute the whole score from scratch Recompute the partial score for the affected variable
15 Optimizing the structure. Algorithms Greedy search Start from structure with no links Add a link that yields the best score improvement Metropolis algorithm (with simulated annealing Local additions and deletions Avoids being trapped in local optimal
Learning Bayesian networks
1 Lecture topics: Learning Bayesian networks from data maximum likelihood, BIC Bayesian, marginal likelihood Learning Bayesian networks There are two problems we have to solve in order to estimate Bayesian
More informationBayesian belief networks. Inference.
Lecture 13 Bayesian belief networks. Inference. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Midterm exam Monday, March 17, 2003 In class Closed book Material covered by Wednesday, March 12 Last
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationBayesian Methods: Naïve Bayes
Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationMLE/MAP + Naïve Bayes
10601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes Matt Gormley Lecture 19 March 20, 2018 1 Midterm Exam Reminders
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Chapter 8&9: Classification: Part 3 Instructor: Yizhou Sun yzsun@ccs.neu.edu March 12, 2013 Midterm Report Grade Distribution 90100 10 8089 16 7079 8 6069 4
More informationGenerative Models for Discrete Data
Generative Models for Discrete Data ddebarr@uw.edu 20160421 Agenda Bayesian Concept Learning BetaBinomial Model DirichletMultinomial Model Naïve Bayes Classifiers Bayesian Concept Learning Numbers
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of Kmeans Hard assignments of data points to clusters small shift of a
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More information4: Parameter Estimation in Fully Observed BNs
10708: Probabilistic Graphical Models 10708, Spring 2015 4: Parameter Estimation in Fully Observed BNs Lecturer: Eric P. Xing Scribes: Purvasha Charavarti, Natalie Klein, Dipan Pal 1 Learning Graphical
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationThe Naïve Bayes Classifier. Machine Learning Fall 2017
The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning
More informationStatistical learning. Chapter 20, Sections 1 3 1
Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationProbabilistic Graphical Models
Parameter Estimation December 14, 2015 Overview 1 Motivation 2 3 4 What did we have so far? 1 Representations: how do we model the problem? (directed/undirected). 2 Inference: given a model and partially
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Nonparametric
More informationProbability Based Learning
Probability Based Learning Lecture 7, DD2431 Machine Learning J. Sullivan, A. Maki September 2013 Advantages of Probability Based Methods Work with sparse training data. More powerful than deterministic
More informationBayesian Learning. Artificial Intelligence Programming. 150: Learning vs. Deduction
150: Learning vs. Deduction Artificial Intelligence Programming Bayesian Learning Chris Brooks Department of Computer Science University of San Francisco So far, we ve seen two types of reasoning: Deductive
More informationIntroduction to Machine Learning Midterm Exam Solutions
10701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv BarJoseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two onepage, twosided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood
More informationAn EmpiricalBayes Score for Discrete Bayesian Networks
An EmpiricalBayes Score for Discrete Bayesian Networks scutari@stats.ox.ac.uk Department of Statistics September 8, 2016 Bayesian Network Structure Learning Learning a BN B = (G, Θ) from a data set D
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Empirical Bayes, Hierarchical Bayes Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 5: Due April 10. Project description on Piazza. Final details coming
More informationMachine Learning, Fall 2012 Homework 2
060 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv BarJoseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0
More informationScoring functions for learning Bayesian networks
Tópicos Avançados p. 1/48 Scoring functions for learning Bayesian networks Alexandra M. Carvalho Tópicos Avançados p. 2/48 Plan Learning Bayesian networks Scoring functions for learning Bayesian networks:
More informationA Brief Review of Probability, Bayesian Statistics, and Information Theory
A Brief Review of Probability, Bayesian Statistics, and Information Theory Brendan Frey Electrical and Computer Engineering University of Toronto frey@psi.toronto.edu http://www.psi.toronto.edu A system
More informationIntroduc)on to Bayesian methods (con)nued)  Lecture 16
Introduc)on to Bayesian methods (con)nued)  Lecture 16 David Sontag New York University Slides adapted from Luke Zettlemoyer, Carlos Guestrin, Dan Klein, and Vibhav Gogate Outline of lectures Review of
More informationStatistical learning. Chapter 20, Sections 1 4 1
Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationCS 6140: Machine Learning Spring 2016
CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa?on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Logis?cs Assignment
More informationLearning Bayesian Networks for Biomedical Data
Learning Bayesian Networks for Biomedical Data Faming Liang (Texas A&M University ) Liang, F. and Zhang, J. (2009) Learning Bayesian Networks for Discrete Data. Computational Statistics and Data Analysis,
More informationLearning With Bayesian Networks. Markus Kalisch ETH Zürich
Learning With Bayesian Networks Markus Kalisch ETH Zürich Inference in BNs  Review P(Burglary JohnCalls=TRUE, MaryCalls=TRUE) Exact Inference: P(b j,m) = c Sum e Sum a P(b)P(e)P(a b,e)p(j a)p(m a) Deal
More informationCS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning
CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationStatistical learning. Chapter 20, Sections 1 3 1
Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationLecture 2: Simple Classifiers
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 2: Simple Classifiers Slides based on Rich Zemel s All lecture slides will be available on the course website: www.cs.toronto.edu/~jessebett/csc412
More informationBayes Rule. CS789: Machine Learning and Neural Network Bayesian learning. A Side Note on Probability. What will we learn in this lecture?
Bayes Rule CS789: Machine Learning and Neural Network Bayesian learning P (Y X) = P (X Y )P (Y ) P (X) Jakramate Bootkrajang Department of Computer Science Chiang Mai University P (Y ): prior belief, prior
More informationClustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26
Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar
More information6.867 Machine learning, lecture 23 (Jaakkola)
Lecture topics: Markov Random Fields Probabilistic inference Markov Random Fields We will briefly go over undirected graphical models or Markov Random Fields (MRFs) as they will be needed in the context
More informationBayesian Networks Structure Learning (cont.)
Koller & Friedman Chapters (handed out): Chapter 11 (short) Chapter 1: 1.1, 1., 1.3 (covered in the beginning of semester) 1.4 (Learning parameters for BNs) Chapter 13: 13.1, 13.3.1, 13.4.1, 13.4.3 (basic
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More informationOverview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
Overview of Course So far, we have studied The concept of Bayesian network Independence and Separation in Bayesian networks Inference in Bayesian networks The rest of the course: Data analysis using Bayesian
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:
More informationLearning Terminological Naïve Bayesian Classifiers Under Different Assumptions on Missing Knowledge
Learning Terminological Naïve Bayesian Classifiers Under Different Assumptions on Missing Knowledge Pasquale Minervini Claudia d Amato Nicola Fanizzi Department of Computer Science University of Bari URSW
More informationLanguage as a Stochastic Process
CS769 Spring 2010 Advanced Natural Language Processing Language as a Stochastic Process Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Basic Statistics for NLP Pick an arbitrary letter x at random from any
More informationLearning in Bayesian Networks
Learning in Bayesian Networks Florian Markowetz MaxPlanckInstitute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks
More informationProbabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov
Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly
More informationProbabilistic Reasoning
Course 16 :198 :520 : Introduction To Artificial Intelligence Lecture 7 Probabilistic Reasoning Abdeslam Boularias Monday, September 28, 2015 1 / 17 Outline We show how to reason and act under uncertainty.
More informationPAClearning, VC Dimension and Marginbased Bounds
More details: General: http://www.learningwithkernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAClearning, VC Dimension and Marginbased
More informationSoft Computing. Lecture Notes on Machine Learning. Matteo Matteucci.
Soft Computing Lecture Notes on Machine Learning Matteo Matteucci matteucci@elet.polimi.it Department of Electronics and Information Politecnico di Milano Matteo Matteucci c Lecture Notes on Machine Learning
More informationA brief introduction to Conditional Random Fields
A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood
More informationIntroduction to Bayesian Learning
Course Information Introduction Introduction to Bayesian Learning Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Apprendimento Automatico: Fondamenti  A.A. 2016/2017 Outline
More informationp L yi z n m x N n xi
y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationA Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007
Decision Trees, cont. Boosting Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 1 st, 2007 1 A Decision Stump 2 1 The final tree 3 Basic Decision Tree Building Summarized
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongamro, Namgu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationStudy Notes on the Latent Dirichlet Allocation
Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection
More informationThe Bayes classifier
The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal
More informationCS446: Machine Learning Fall Final Exam. December 6 th, 2016
CS446: Machine Learning Fall 2016 Final Exam December 6 th, 2016 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains
More informationMACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 8: Latent Variables, EM Yann LeCun
Y. LeCun: Machine Learning and Pattern Recognition p. 1/? MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 8: Latent Variables, EM Yann LeCun The Courant Institute, New York University http://yann.lecun.com
More informationIntroduction to Graphical Models
Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) YungKyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic
More informationFEEG6017 lecture: Akaike's information criterion; model reduction. Brendan Neville
FEEG6017 lecture: Akaike's information criterion; model reduction Brendan Neville bjn1c13@ecs.soton.ac.uk Occam's razor William of Occam, 12881348. All else being equal, the simplest explanation is the
More informationMachine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation
Machine Learning CMPT 726 Simon Fraser University Binomial Parameter Estimation Outline Maximum Likelihood Estimation Smoothed Frequencies, Laplace Correction. Bayesian Approach. Conjugate Prior. Uniform
More informationMIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE
MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE March 28, 2012 The exam is closed book. You are allowed a double sided one page cheat sheet. Answer the questions in the spaces provided on
More informationCMPSCI 240: Reasoning about Uncertainty
CMPSCI 240: Reasoning about Uncertainty Lecture 17: Representing Joint PMFs and Bayesian Networks Andrew McGregor University of Massachusetts Last Compiled: April 7, 2017 Warm Up: Joint distributions Recall
More informationExpectationMaximization (EM) algorithm
I529: Machine Learning in Bioinformatics (Spring 2017) ExpectationMaximization (EM) algorithm Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2017 Contents Introduce
More informationA.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I. kevin small & byron wallace
A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I kevin small & byron wallace today a review of probability random variables, maximum likelihood, etc. crucial for clinical
More informationApproximate Inference using MCMC
Approximate Inference using MCMC 9.520 Class 22 Ruslan Salakhutdinov BCS and CSAIL, MIT 1 Plan 1. Introduction/Notation. 2. Examples of successful Bayesian models. 3. Basic Sampling Algorithms. 4. Markov
More informationMachine learning: lecture 20. Tommi S. Jaakkola MIT CSAIL
Machine learning: lecture 20 ommi. Jaakkola MI CAI tommi@csail.mit.edu opics Representation and graphical models examples Bayesian networks examples, specification graphs and independence associated distribution
More informationIntroduction to Probabilistic Reasoning. Image credit: NASA. Assignment
Introduction to Probabilistic Reasoning Brian C. Williams 16.410/16.413 November 17 th, 2010 11/17/10 copyright Brian Williams, 200510 1 Brian C. Williams, copyright 200009 Image credit: NASA. Assignment
More informationIntroduction to Probabilistic Graphical Models
Introduction to Probabilistic Graphical Models KyuBaek Hwang and ByoungTak Zhang Biointelligence Lab School of Computer Science and Engineering Seoul National University Seoul 151742 Korea Email: kbhwang@bi.snu.ac.kr
More informationMachine Learning: Probability Theory
Machine Learning: Probability Theory Prof. Dr. Martin Riedmiller AlbertLudwigsUniversity Freiburg AG Maschinelles Lernen Theories p.1/28 Probabilities probabilistic statements subsume different effects
More informationModel comparison and selection
BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)
More informationIntroduction to Probability for Graphical Models
Introduction to Probability for Grahical Models CSC 4 Kaustav Kundu Thursday January 4, 06 *Most slides based on Kevin Swersky s slides, Inmar Givoni s slides, Danny Tarlow s slides, Jaser Snoek s slides,
More informationBANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1
BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013) ISLR Data Mining I Lecture
More informationOutline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination
Probabilistic Graphical Models COMP 79090 Seminar Spring 2011 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Outline It Introduction ti Representation Bayesian network Conditional Independence Inference:
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationSome Probability and Statistics
Some Probability and Statistics David M. Blei COS424 Princeton University February 13, 2012 Card problem There are three cards Red/Red Red/Black Black/Black I go through the following process. Close my
More informationText Categorization CSE 454. (Based on slides by Dan Weld, Tom Mitchell, and others)
Text Categorization CSE 454 (Based on slides by Dan Weld, Tom Mitchell, and others) 1 Given: Categorization A description of an instance, x X, where X is the instance language or instance space. A fixed
More informationInternational Journal of Uncertainty, Fuzziness and KnowledgeBased Systems c World Scientific Publishing Company
International Journal of Uncertainty, Fuzziness and KnowledgeBased Systems c World Scientific Publishing Company UNSUPERVISED LEARNING OF BAYESIAN NETWORKS VIA ESTIMATION OF DISTRIBUTION ALGORITHMS: AN
More informationNaïve Bayes. Vibhav Gogate The University of Texas at Dallas
Naïve Bayes Vibhav Gogate The University of Texas at Dallas Supervised Learning of Classifiers Find f Given: Training set {(x i, y i ) i = 1 n} Find: A good approximation to f : X Y Examples: what are
More informationRegression I: Mean Squared Error and Measuring Quality of Fit
Regression I: Mean Squared Error and Measuring Quality of Fit Applied Multivariate Analysis Lecturer: Darren Homrighausen, PhD 1 The Setup Suppose there is a scientific problem we are interested in solving
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationNotes on Markov Networks
Notes on Markov Networks Lili Mou moull12@sei.pku.edu.cn December, 2014 This note covers basic topics in Markov networks. We mainly talk about the formal definition, Gibbs sampling for inference, and maximum
More informationMachine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014
Machine Learning Probability Basics Basic definitions: Random variables, joint, conditional, marginal distribution, Bayes theorem & examples; Probability distributions: Binomial, Beta, Multinomial, Dirichlet,
More informationMachine Learning
Machine Learning 10701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 13, 2011 Today: The Big Picture Overfitting Review: probability Readings: Decision trees, overfiting
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam
ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The
More informationIntroduction to Gaussian Process
Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Nonparametric Gaussian Process (GP) GP Regression
More informationProbability Theory. Introduction to Probability Theory. Principles of Counting Examples. Principles of Counting. Probability spaces.
Probability Theory To start out the course, we need to know something about statistics and probability Introduction to Probability Theory L645 Advanced NLP Autumn 2009 This is only an introduction; for
More informationMachine Learning, Midterm Exam
10601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv BarJoseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have
More informationClassification: The rest of the story
U NIVERSITY OF ILLINOIS AT URBANACHAMPAIGN CS598 Machine Learning for Signal Processing Classification: The rest of the story 3 October 2017 Today s lecture Important things we haven t covered yet Fisher
More informationPHASES OF STATISTICAL ANALYSIS 1. Initial Data Manipulation Assembling data Checks of data quality  graphical and numeric
PHASES OF STATISTICAL ANALYSIS 1. Initial Data Manipulation Assembling data Checks of data quality  graphical and numeric 2. Preliminary Analysis: Clarify Directions for Analysis Identifying Data Structure:
More informationProbability Theory. Prof. Dr. Martin Riedmiller AG Maschinelles Lernen AlbertLudwigsUniversität Freiburg.
Probability Theory Prof. Dr. Martin Riedmiller AG Maschinelles Lernen AlbertLudwigsUniversität Freiburg riedmiller@informatik.unifreiburg.de Prof. Dr. Martin Riedmiller Machine Learning Lab, University
More informationLEARNING WITH BAYESIAN NETWORKS
LEARNING WITH BAYESIAN NETWORKS Author: David Heckerman Presented by: Dilan Kiley Adapted from slides by: Yan Zhang  2006, Jeremy Gould 2013, Chip Galusha 2014 Jeremy Gould 2013Chip Galus May 6th, 2016
More information