Bayesian Machine Learning


 Morgan Sims
 11 months ago
 Views:
Transcription
1 Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models Cornell University September 1, / 46
2 References Bishop (2006), MacKay (2003), Rasmussen and Ghahramani (2001), Ghahramani (2015), Ghahramani (2014), Wilson (2014). 2 / 46
3 Bayesian Modelling (Theory of Everything) 3 / 46
4 Regularisation = MAP Bayesian Inference Example: Density Estimation Observations y 1,..., y N drawn from unknown density p(y). Model p(y θ) = w 1N (y µ 1, σ 2 1) + w 2N (y µ 2, σ 2 2), θ = {w 1, w 2, µ 1, µ 2, σ 1, σ 2}. Likelihood p(y θ) = N i=1 p(yi θ). Can learn all free parameters θ using maximum likelihood... 4 / 46
5 Regularisation = MAP Bayesian Inference Regularisation or MAP Find argmax θ log p(θ y) c = model fit {}}{ complexity penalty {}}{ log p(y θ) + log p(θ) Choose p(θ) such that p(θ) 0 faster than p(y θ) as σ 1 or σ 2 0. Bayesian Inference Predictive Distribution: p(y y) = p(y θ)p(θ y)dθ. Parameter Posterior: p(θ y) p(y θ)p(θ). p(θ) need not be zero anywhere in order to make reasonable inferences. Can use a sampling scheme, with conjugate posterior updates for each separate mixture component, using an inverse Gamma prior on the variances σ 2 1, σ / 46
6 Model Selection and Marginal Likelihood p(y M 1, X) = p(y f 1(x, w))p(w)dw (1) Complex Model Simple Model Appropriate Model p(y M) y All Possible Datasets 6 / 46
7 Model Comparison p(h 1 D) p(h = p(d H1) p(h 1) 2 D) p(d H 2) p(h. (2) 2) 7 / 46
8 Blackboard: Examples of Occam s Razor in Everyday Inferences For further reading, see MacKay (2003) textbook, Information Theory, Inference, and Learning Algorithms. 8 / 46
9 Occam s Razor Example 1, 3, 7, 11,??,?? H 1 : the sequence is an arithmetic progression, add n, where n is an integer. H 2 : the sequence is generated by a cubic function of the form cx 3 + dx 2 + e, where c, d, and e are fractions. ( 1 11 x x ) 9 / 46
10 Model Selection Outputs, y(x) Inputs, x Observations y(x). Assume p(y(x) f (x)) N (y(x); f (x), σ 2 ). Consider polynomials of different orders. As always, observations are out of the chosen model class! Which model should we choose? f 0(x) = a 0, (3) f 1(x) = a 0 + a 1x, (4) f 2(x) = a 0 + a 1x + a 2x 2, (5). (6) f J(x) = a 0 + a 1x + a 2x a Jx J. (7) 10 / 46
11 Model Selection: Occam s Hill 0.25 Marginal Likelihood (Evidence) Model Order Marginal likelihood (evidence) as a function of model order, using an isotropic prior p(a) = N (0, σ 2 I). 11 / 46
12 Model Selection: Occam s Asymptote 0.25 Marginal Likelihood (Evidence) Model Order Marginal likelihood (evidence) as a function of model order, using an anisotropic prior p(a i) = N (0, γ i ), with γ learned from the data. 12 / 46
13 Occam s Razor Marginal Likelihood (Evidence) Marginal Likelihood (Evidence) Model Order Model Order (a) Isotropic Gaussian Prior (b) Anisotropic Gaussian Prior For further reading, see Rasmussen and Ghahramani (2001) (Occam s Razor), Kass and Raftery (1995) (Bayes Factors), and MacKay (2003), Chapter / 46
14 Automatic Choice of Dimensionality for PCA PCA projects a d dimensional vector x into a k d dimensional space in a way that maximizes the variance of the projection. How do we choose k? 14 / 46
15 Probabilistic PCA Formulate dimensionality reduction as a probabilistic model: x = Let V = vi d and p(w) N (0, I k). k h jw j + m + ɛ, (8) j=1 = Hw + m + ɛ, (9) ɛ N (0, V). (10) The maximum likelihood solution for H, given data D = {x 1,... x N} is exactly equal to the PCA solution! Let s place probability distributions over H, m, integrate away from the likelihood, then use the evidence p(d k) to determine the value of k. As N, the evidence will collapse onto the true value of k. Automatically Learning the Dimensionality of PCA (Minka, 2001). 15 / 46
16 Automatically Learning the Dimensionality of PCA 16 / 46
17 Automatically Learning the Dimensionality of PCA 17 / 46
18 Automatically Learning the Dimensionality of PCA 18 / 46
19 Automatically Learning the Dimensionality of PCA 19 / 46
20 Automatically Learning the Dimensionality of PCA 20 / 46
21 Model Construction: Support and Inductive Biases Support: which datasets (hypotheses) are a priori possible. Inductive Biases: which datasets are a priori likely. Want to make the support of our model as big as possible, with inductive biases which are calibrated to particular applications, so as to not rule out potential explanations of the data, while at the same time quickly learn from a finite amount of information on a particular application. Examples (discussion and illustrations with respect to figure on slide 6): human learning and deep learning. 21 / 46
22 Graphical Models Open circles correspond to random variables Filled circles correspond to observed random variables (whiteboard) Small closed circles correspond to deterministic variables (whiteboard) Square boxes show factor decompositions Edges represent statistical dependencies between variables Whole model represents a joint probability distribution 22 / 46
23 Graphical Models (Motivation) Graphs are an intuitive way of representing and visualising the relationships between many variables. (Examples: family trees, electric circuit diagrams, neural networks). A graph allows us to abstract out the conditional independencies between variables from the details of their parametric forms? (Whiteboard). We can answer questions like Is A dependent on B given that we know the value of C? just by looking at the graph. Graphical models allow us to define the general message passing algorithms that implement probabilistic inference efficiently. Thus we can answer queries like What is p(a C = c)? without enumerating all settings of variables in the model. 23 / 46
24 Independencies 24 / 46
25 Examples of Conditional Independencies 25 / 46
26 Group Discussion: Conditional Independence 26 / 46
27 Directed Graphical Model Model represents a joint distribution Edges show dependencies Example (fully connected graph): p(a, b, c) = p(a b, c)p(b c)p(c) Is this a unique representation of p(a, b, c)? 27 / 46
28 Directed Graphical Model Model represents a joint distribution Edges show dependencies Example (fully connected graph): p(a, b, c) = p(a b, c)p(b c)p(c) Is this a unique representation of p(a, b, c)? For a fully connected graph: p(x 1,..., x K) = p(x K x 1,..., x K 1)... p(x 2 x 1)p(x 1) (11) 28 / 46
29 Sparse Directed Graphical Model Group discussion: what s the joint distribution? 29 / 46
30 Joint distributions For a graph with K nodes, the joint distribution is given by K p(x) = p(x k pa k ) (12) k=1 30 / 46
31 Example: Polynomial Regression y = w T φ(x, v) + ɛ (13) ɛ N (0, σ 2 ) (14) w N (0, α 2 ) (15) What s the graphical model defining the joint distribution p(w, y), with y = (y 1,..., y N) T? How do we use this graphical model to infer p(y D, α 2, σ 2, v)? Group discussion. 31 / 46
32 Conditional Independencies 32 / 46
33 Conditional Independencies: TailTail p(a, b) = c p(a, b, c) = c p(a c)(b c)p(c) p(a)p(b) in general (16) a b (17) a and b are not marginally independent 33 / 46
34 TailTail Observed p(a, b c) = Want to see whether p(a, b c) = p(a c)p(b c). p(a, b, c) p(c) = p(a c)p(b c)p(c) p(c) a b c = p(a c)p(b c) (18) 34 / 46
35 TailHead p(a, b) = c p(a, b, c) = c p(a)p(c a)p(b a) = c p(a, c)p(b a) (19) = p(a)p(b a) p(a)p(b) in general (20) a b. a and b are not marginally independent. 35 / 46
36 TailHead Observed Want to see whether p(a, b c) = p(a c)p(b c). p(a, b c) = p(a, b, c) p(c) = p(a)p(c a)p(b c) p(c) (21) = p(a)p(c a) p(b c) = p(a c)p(b c) (22) p(c) Therefore a b c. 36 / 46
37 HeadHead p(a, b) = c p(a, b, c) = c p(a)p(b)p(c a, b) = p(a)(b) (23) a is marginally independent b. 37 / 46
38 HeadHead Observed p(a, b c) = p(a, b, c) p(c) = p(a)p(b)p(c a, b). (24) p(c) a b c. In all other cases observing c blocked dependencies. However, here, observing c creates dependencies! This phenomenon is called explaining away (think back to the sprinkler, rain, ground example). 38 / 46
39 Dseparation Semantics: X Y V if V dseparates X from Y. Definition: V dseparates X from Y if every undirected path from X to Y blocked by V. A path is blocked by V if there is a node W on the path such that either: 1. W has converging arrows along the path ( W ) (headhead) and neither W nor its descendants are observed (W / V), or 2. W does not have converging arrows along the path ( W or W ) (headtail or tailtail) and W is observed (W V). Corollary: Markov blanket of node x i : {parents children parents of children}. x i is independent of everything else conditioned on this blanket. 39 / 46
40 Dseparation Examples Is a b c? Is a b f? How do deterministic parameters (denoted by small black circles), such as the noise variance σ 2 in our Bayesian basis regression model, behave with respect to dseparation? 40 / 46
41 Data sampled from a Gaussian distribution If we condition on the mean µ, the data x i are independent. But what if we look at the marginal distribution having integrated away µ? 41 / 46
42 Naive Inference 42 / 46
43 Exploiting Graph Structure for Efficiency 43 / 46
44 Prelude to Belief Propagation 44 / 46
45 Ideas behind Belief Propagation 45 / 46
46 Next class Up next... Belief Propagation! 46 / 46
Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS
Part I C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Probabilistic Graphical Models Graphical representation of a probabilistic model Each variable corresponds to a
More informationStatistical Approaches to Learning and Discovery
Statistical Approaches to Learning and Discovery Graphical Models Zoubin Ghahramani & Teddy Seidenfeld zoubin@cs.cmu.edu & teddy@stat.cmu.edu CALD / CS / Statistics / Philosophy Carnegie Mellon University
More informationConditional Independence
Conditional Independence Sargur Srihari srihari@cedar.buffalo.edu 1 Conditional Independence Topics 1. What is Conditional Independence? Factorization of probability distribution into marginals 2. Why
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationMachine Learning Lecture 14
Many slides adapted from B. Schiele, S. Roth, Z. Gharahmani Machine Learning Lecture 14 Undirected Graphical Models & Inference 23.06.2015 Bastian Leibe RWTH Aachen http://www.vision.rwthaachen.de/ leibe@vision.rwthaachen.de
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Empirical Bayes, Hierarchical Bayes Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 5: Due April 10. Project description on Piazza. Final details coming
More informationMachine Learning Summer School
Machine Learning Summer School Lecture 1: Introduction to Graphical Models Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ epartment of ngineering University of ambridge, UK
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationp L yi z n m x N n xi
y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongamro, Namgu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and NonParametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation
More informationLecture 9: PGM Learning
13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationDirected and Undirected Graphical Models
Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationApproximate Inference using MCMC
Approximate Inference using MCMC 9.520 Class 22 Ruslan Salakhutdinov BCS and CSAIL, MIT 1 Plan 1. Introduction/Notation. 2. Examples of successful Bayesian models. 3. Basic Sampling Algorithms. 4. Markov
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing
More information6.867 Machine learning, lecture 23 (Jaakkola)
Lecture topics: Markov Random Fields Probabilistic inference Markov Random Fields We will briefly go over undirected graphical models or Markov Random Fields (MRFs) as they will be needed in the context
More informationIntroduction to Artificial Intelligence. Unit # 11
Introduction to Artificial Intelligence Unit # 11 1 Course Outline Overview of Artificial Intelligence State Space Representation Search Techniques Machine Learning Logic Probabilistic Reasoning/Bayesian
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationA Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004
A Brief Introduction to Graphical Models Presenter: Yijuan Lu November 12,2004 References Introduction to Graphical Models, Kevin Murphy, Technical Report, May 2001 Learning in Graphical Models, Michael
More informationCS 188: Artificial Intelligence. Bayes Nets
CS 188: Artificial Intelligence Probabilistic Inference: Enumeration, Variable Elimination, Sampling Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew
More informationProbabilistic Graphical Models
2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector
More informationMachine Learning Techniques for Computer Vision
Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM
More informationMachine Learning, Midterm Exam
10601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv BarJoseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have
More informationStatistical learning. Chapter 20, Sections 1 4 1
Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by
More informationProbabilistic Graphical Models
Parameter Estimation December 14, 2015 Overview 1 Motivation 2 3 4 What did we have so far? 1 Representations: how do we model the problem? (directed/undirected). 2 Inference: given a model and partially
More informationBayesian Inference. Definitions from Probability: Naive Bayes Classifiers: Advantages and Disadvantages of Naive Bayes Classifiers:
Bayesian Inference The purpose of this document is to review belief networks and naive Bayes classifiers. Definitions from Probability: Belief networks: Naive Bayes Classifiers: Advantages and Disadvantages
More informationAn Introduction to Probabilistic Graphical Models
An Introduction to Probabilistic Graphical Models Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Reference material D. Koller
More informationA Process over all Stationary Covariance Kernels
A Process over all Stationary Covariance Kernels Andrew Gordon Wilson June 9, 0 Abstract I define a process over all stationary covariance kernels. I show how one might be able to perform inference that
More informationCS 6140: Machine Learning Spring 2016
CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa?on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Logis?cs Assignment
More informationReview: Directed Models (Bayes Nets)
X Review: Directed Models (Bayes Nets) Lecture 3: Undirected Graphical Models Sam Roweis January 2, 24 Semantics: x y z if z dseparates x and y dseparation: z dseparates x from y if along every undirected
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationThe Origin of Deep Learning. Lili Mou Jan, 2015
The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets
More informationOutline lecture 2 2(30)
Outline lecture 2 2(3), Lecture 2 Linear Regression it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic Control
More informationSequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007
Sequential Monte Carlo and Particle Filtering Frank Wood Gatsby, November 2007 Importance Sampling Recall: Let s say that we want to compute some expectation (integral) E p [f] = p(x)f(x)dx and we remember
More informationOutline Lecture 2 2(32)
Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic
More informationBayesian Learning in Undirected Graphical Models
Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ and Center for Automated Learning and
More informationIntroduction to Logistic Regression and Support Vector Machine
Introduction to Logistic Regression and Support Vector Machine guest lecturer: MingWei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel
More informationProbabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov
Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly
More informationVariational Principal Components
Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings
More informationBayesian Methods: Naïve Bayes
Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior
More informationData Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber
Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2017 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields
More informationMCMC notes by Mark Holder
MCMC notes by Mark Holder Bayesian inference Ultimately, we want to make probability statements about true values of parameters, given our data. For example P(α 0 < α 1 X). According to Bayes theorem:
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationAn EmpiricalBayes Score for Discrete Bayesian Networks
An EmpiricalBayes Score for Discrete Bayesian Networks scutari@stats.ox.ac.uk Department of Statistics September 8, 2016 Bayesian Network Structure Learning Learning a BN B = (G, Θ) from a data set D
More informationIntroduction to Bayesian Learning
Course Information Introduction Introduction to Bayesian Learning Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Apprendimento Automatico: Fondamenti  A.A. 2016/2017 Outline
More informationGenerative and Discriminative Approaches to Graphical Models CMSC Topics in AI
Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single
More informationPattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods
Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter
More informationModeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop
Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two onepage, twosided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationBayes Networks 6.872/HST.950
Bayes Networks 6.872/HST.950 What Probabilistic Models Should We Use? Full joint distribution Completely expressive Hugely datahungry Exponential computational complexity Naive Bayes (full conditional
More informationCS540 Machine learning L9 Bayesian statistics
CS540 Machine learning L9 Bayesian statistics 1 Last time Naïve Bayes BetaBernoulli 2 Outline Bayesian concept learning BetaBernoulli model (review) Dirichletmultinomial model Credible intervals 3 Bayesian
More informationThe Monte Carlo Method: Bayesian Networks
The Method: Bayesian Networks Dieter W. Heermann Methods 2009 Dieter W. Heermann ( Methods)The Method: Bayesian Networks 2009 1 / 18 Outline 1 Bayesian Networks 2 Gene Expression Data 3 Bayesian Networks
More informationProbability Based Learning
Probability Based Learning Lecture 7, DD2431 Machine Learning J. Sullivan, A. Maki September 2013 Advantages of Probability Based Methods Work with sparse training data. More powerful than deterministic
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationStatistical learning. Chapter 20, Sections 1 3 1
Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationReview: Bayesian learning and inference
Review: Bayesian learning and inference Suppose the agent has to make decisions about the value of an unobserved query variable X based on the values of an observed evidence variable E Inference problem:
More informationReasoning Under Uncertainty: Belief Network Inference
Reasoning Under Uncertainty: Belief Network Inference CPSC 322 Uncertainty 5 Textbook 10.4 Reasoning Under Uncertainty: Belief Network Inference CPSC 322 Uncertainty 5, Slide 1 Lecture Overview 1 Recap
More informationBayesian Learning Features of Bayesian learning methods:
Bayesian Learning Features of Bayesian learning methods: Each observed training example can incrementally decrease or increase the estimated probability that a hypothesis is correct. This provides a more
More informationIntroduction to Bayesian Statistics
School of Computing & Communication, UTS January, 207 Random variables Preuniversity: A number is just a fixed value. When we talk about probabilities: When X is a continuous random variable, it has a
More informationStatistical Techniques in Robotics (16831, F12) Lecture#21 (Monday November 12) Gaussian Processes
Statistical Techniques in Robotics (16831, F12) Lecture#21 (Monday November 12) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan 1, M. Koval and P. Parashar 1 Applications of Gaussian
More informationBayesian network modeling. 1
Bayesian network modeling http://springuniversity.bc3research.org/ 1 Probabilistic vs. deterministic modeling approaches Probabilistic Explanatory power (e.g., r 2 ) Explanation why Based on inductive
More informationLearning in Bayesian Networks
Learning in Bayesian Networks Florian Markowetz MaxPlanckInstitute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Nonparametric
More informationSTAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Bayesian Networks
STAT 598L Probabilistic Graphical Models Instructor: Sergey Kirshner Bayesian Networks Representing Joint Probability Distributions 2 n 1 free parameters Reducing Number of Parameters: Conditional Independence
More informationIntroduction to Graphical Models
Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) YungKyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic
More informationIntroduction to Machine Learning
Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationIntroduction to Machine Learning Midterm Exam Solutions
10701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv BarJoseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,
More informationMachine Learning
Machine Learning 10701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 13, 2011 Today: The Big Picture Overfitting Review: probability Readings: Decision trees, overfiting
More informationModel Selection for Gaussian Processes
Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal
More informationCourse 495: Advanced Statistical Machine Learning/Pattern Recognition
Course 495: Advanced Statistical Machine Learning/Pattern Recognition Goal (Lecture): To present Probabilistic Principal Component Analysis (PPCA) using both Maximum Likelihood (ML) and Expectation Maximization
More informationLecture 2: Simple Classifiers
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 2: Simple Classifiers Slides based on Rich Zemel s All lecture slides will be available on the course website: www.cs.toronto.edu/~jessebett/csc412
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationIntroduc)on to Bayesian methods (con)nued)  Lecture 16
Introduc)on to Bayesian methods (con)nued)  Lecture 16 David Sontag New York University Slides adapted from Luke Zettlemoyer, Carlos Guestrin, Dan Klein, and Vibhav Gogate Outline of lectures Review of
More informationPHASES OF STATISTICAL ANALYSIS 1. Initial Data Manipulation Assembling data Checks of data quality  graphical and numeric
PHASES OF STATISTICAL ANALYSIS 1. Initial Data Manipulation Assembling data Checks of data quality  graphical and numeric 2. Preliminary Analysis: Clarify Directions for Analysis Identifying Data Structure:
More informationPart 4: Conditional Random Fields
Part 4: Conditional Random Fields Sebastian Nowozin and Christoph H. Lampert Colorado Springs, 25th June 2011 1 / 39 Problem (Probabilistic Learning) Let d(y x) be the (unknown) true conditional distribution.
More informationCMPSCI 240: Reasoning about Uncertainty
CMPSCI 240: Reasoning about Uncertainty Lecture 17: Representing Joint PMFs and Bayesian Networks Andrew McGregor University of Massachusetts Last Compiled: April 7, 2017 Warm Up: Joint distributions Recall
More informationMarkov Random Fields for Computer Vision (Part 1)
Markov Random Fields for Computer Vision (Part 1) Machine Learning Summer School (MLSS 2011) Stephen Gould stephen.gould@anu.edu.au Australian National University 13 17 June, 2011 Stephen Gould 1/23 Pixel
More informationIntroduction to Bayesian Networks
Introduction to Bayesian Networks Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/23 Outline Basic Concepts Bayesian
More informationCS 188: Artificial Intelligence Spring Announcements
CS 188: Artificial Intelligence Spring 2011 Lecture 16: Bayes Nets IV Inference 3/28/2011 Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew Moore Announcements
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationUndirected graphical models
Undirected graphical models Semantics of probabilistic models over undirected graphs Parameters of undirected models Example applications COMP652 and ECSE608, February 16, 2017 1 Undirected graphical
More informationStephen Scott.
1 / 28 ian ian Optimal (Adapted from Ethem Alpaydin and Tom Mitchell) Naïve Nets sscott@cse.unl.edu 2 / 28 ian Optimal Naïve Nets Might have reasons (domain information) to favor some hypotheses/predictions
More informationIntroduction to Probabilistic Reasoning. Image credit: NASA. Assignment
Introduction to Probabilistic Reasoning Brian C. Williams 16.410/16.413 November 17 th, 2010 11/17/10 copyright Brian Williams, 200510 1 Brian C. Williams, copyright 200009 Image credit: NASA. Assignment
More informationCours 7 12th November 2014
Sum Product Algorithm and Hidden Markov Model 2014/2015 Cours 7 12th November 2014 Enseignant: Francis Bach Scribe: Pauline Luc, Mathieu Andreux 7.1 Sum Product Algorithm 7.1.1 Motivations Inference, along
More information4 : Exact Inference: Variable Elimination
10708: Probabilistic Graphical Models 10708, Spring 2014 4 : Exact Inference: Variable Elimination Lecturer: Eric P. ing Scribes: Soumya Batra, Pradeep Dasigi, Manzil Zaheer 1 Probabilistic Inference
More informationBayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine
Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Mike Tipping Gaussian prior Marginal prior: single α Independent α Cambridge, UK Lecture 3: Overview
More informationLatent Dirichlet Allocation
Latent Dirichlet Allocation 1 Directed Graphical Models William W. Cohen Machine Learning 10601 2 DGMs: The Burglar Alarm example Node ~ random variable Burglar Earthquake Arcs define form of probability
More informationBayesian Networks: Representation, Variable Elimination
Bayesian Networks: Representation, Variable Elimination CS 6375: Machine Learning Class Notes Instructor: Vibhav Gogate The University of Texas at Dallas We can view a Bayesian network as a compact representation
More informationSampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Oliver Schulte  CMPT 419/726. Bishop PRML Ch.
Sampling Methods Oliver Schulte  CMP 419/726 Bishop PRML Ch. 11 Recall Inference or General Graphs Junction tree algorithm is an exact inference method for arbitrary graphs A particular tree structure
More information