Introduction to Probabilistic Machine Learning
|
|
- Juliet Hampton
- 5 years ago
- Views:
Transcription
1 Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1
2 Machine Learning Detecting trends/patterns in the data Making predictions about future data Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 2
3 Machine Learning Detecting trends/patterns in the data Making predictions about future data Two schools of thoughts Learning as optimization: fit a model to minimize some loss function Learning as inference: infer parameters of the data generating distribution Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 2
4 Machine Learning Detecting trends/patterns in the data Making predictions about future data Two schools of thoughts Learning as optimization: fit a model to minimize some loss function Learning as inference: infer parameters of the data generating distribution The two are not really completely disjoint ways of thinking about learning Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 2
5 Plan for the mini-course A series of 4 talks Introduction to Probabilistic and Bayesian Machine Learning (today) Case Study: Bayesian Linear Regression, Approx. Bayesian Inference (Nov 5) Nonparametric Bayesian modeling for function approximation (Nov 7) Nonparam. Bayesian modeling for clustering/dimensionality reduction (Nov 8) Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 3
6 Machine Learning via Probabilistic Modeling Assume data X = {x 1,..., x N} generated from a probabilistic model: Data usually assumed i.i.d. (independent and identically distributed) x 1,..., x N p(x θ) Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 4
7 Machine Learning via Probabilistic Modeling Assume data X = {x 1,..., x N} generated from a probabilistic model: Data usually assumed i.i.d. (independent and identically distributed) x 1,..., x N p(x θ) For i.i.d. data, probability of observed data X given model parameters θ N p(x θ) = p(x 1,..., x N θ) = p(x n θ) n=1 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 4
8 Machine Learning via Probabilistic Modeling Assume data X = {x 1,..., x N} generated from a probabilistic model: Data usually assumed i.i.d. (independent and identically distributed) x 1,..., x N p(x θ) For i.i.d. data, probability of observed data X given model parameters θ N p(x θ) = p(x 1,..., x N θ) = p(x n θ) p(x n θ) denotes the likelihood w.r.t. data point n The form of p(x n θ) depends on the type/characteristics of the data n=1 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 4
9 Some common probability distributions Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 5
10 Maximum Likelihood Estimation (MLE) We wish to estimate parameters θ from observed data {x 1,..., x N} MLE does this by finding θ that maximizes the (log)likelihood p(x θ) ˆθ = arg max log p(x θ) θ Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 6
11 Maximum Likelihood Estimation (MLE) We wish to estimate parameters θ from observed data {x 1,..., x N} MLE does this by finding θ that maximizes the (log)likelihood p(x θ) N ˆθ = arg max log p(x θ) = arg max log p(x n θ) θ θ n=1 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 6
12 Maximum Likelihood Estimation (MLE) We wish to estimate parameters θ from observed data {x 1,..., x N} MLE does this by finding θ that maximizes the (log)likelihood p(x θ) N N ˆθ = arg max log p(x θ) = arg max log p(x n θ) = arg max log p(x n θ) θ θ θ n=1 n=1 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 6
13 Maximum Likelihood Estimation (MLE) We wish to estimate parameters θ from observed data {x 1,..., x N} MLE does this by finding θ that maximizes the (log)likelihood p(x θ) N N ˆθ = arg max log p(x θ) = arg max log p(x n θ) = arg max log p(x n θ) θ θ θ n=1 MLE now reduces to solving an optimization problem w.r.t. θ n=1 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 6
14 Maximum Likelihood Estimation (MLE) We wish to estimate parameters θ from observed data {x 1,..., x N} MLE does this by finding θ that maximizes the (log)likelihood p(x θ) N N ˆθ = arg max log p(x θ) = arg max log p(x n θ) = arg max log p(x n θ) θ θ θ n=1 MLE now reduces to solving an optimization problem w.r.t. θ MLE has some nice theoretical properties (e.g., consistency as N ) n=1 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 6
15 Injecting Prior Knowledge Often, we might a priori know something about the parameters A prior distribution p(θ) can encode/specify this knowledge Bayes rule gives us the posterior distribution over θ: p(θ X) Posterior reflects our updated knowledge about θ using observed data p(θ X) = p(x θ)p(θ) p(x) = p(x θ)p(θ) Likelihood Prior p(x θ)p(θ)dθ θ Note: θ is now a random variable Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 7
16 Maximum-a-Posteriori (MAP) Estimation MAP estimation finds θ that maximizes the posterior p(θ X) p(x θ)p(θ) ˆθ = arg max θ log N n=1 p(x n θ)p(θ) = arg max θ N log p(x n θ) + log p(θ) n=1 MAP now reduces to solving an optimization problem w.r.t. θ Objective function very similar to MLE, except for the log p(θ) term In some sense, MAP is just a regularized MLE Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 8
17 Bayesian Learning Both MLE and MAP only give a point estimate (single best answer) of θ How can we capture/quantify the uncertainty in θ? Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 9
18 Bayesian Learning Both MLE and MAP only give a point estimate (single best answer) of θ How can we capture/quantify the uncertainty in θ? Need to infer the full posterior distribution p(θ X) = p(x θ)p(θ) p(x) = p(x θ)p(θ) Likelihood Prior p(x θ)p(θ)dθ θ Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 9
19 Bayesian Learning Both MLE and MAP only give a point estimate (single best answer) of θ How can we capture/quantify the uncertainty in θ? Need to infer the full posterior distribution p(θ X) = p(x θ)p(θ) p(x) = p(x θ)p(θ) Likelihood Prior p(x θ)p(θ)dθ θ Requires doing a fully Bayesian inference Inference sometimes a somewhat easy and sometimes a (very) hard problem Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 9
20 A Simple Example of Bayesian Inference We want to estimate a coin s bias θ (0, 1) based on N tosses The likelihood model: {x 1,..., x N} Bernoulli(θ) The prior: θ Beta(a, b) p(x n θ) = θ x n (1 θ) 1 x n p(θ a, b) = Γ(a + b) Γ(a)Γ(b) θa 1 (1 θ) b 1 The posterior p(θ X) N n=1 n θ)p(θ a, b) p(x N n=1 n (1 θ) 1 x n θ a 1 (1 θ) b 1 θx = θ a+ N n=1 x n 1 (1 θ) b+n N n=1 x n 1 Thus the posterior is: Beta(a + N n=1 x n, b + N N n=1 x n) Here, the posterior has the same form as the prior (both Beta) Also very easy to perform online inference Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 10
21 Conjugate Priors Recall p(θ X) = p(x θ)p(θ) p(x) Given some data distribution (likelihood) p(x θ) and a prior p(θ) = π(θ α).. The prior is conjugate if the posterior also has the same form, i.e., p(θ α, X) = P(X θ)π(θ π) p(x) = π(θ α ) Several pairs of distributions are conjugate to each other, e.g., Gaussian-Gaussian Beta-Bernoulli Beta-Binomial Gamma-Poisson Dirichlet-Multinomial.. Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 11
22 A Non-Conjugate Case Want to learn a classifier θ for predicting label x { 1, +1} for a point z Assume a logistic likelihood model for the labels p(x n θ) = exp( x nθ z n) Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 12
23 A Non-Conjugate Case Want to learn a classifier θ for predicting label x { 1, +1} for a point z Assume a logistic likelihood model for the labels p(x n θ) = exp( x nθ z n) The prior: θ Normal(µ, Σ) (Gaussian, not conjugate to the logistic) p(θ µ, Σ) exp( 1 2 (θ µ) Σ 1 (θ µ)) Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 12
24 A Non-Conjugate Case Want to learn a classifier θ for predicting label x { 1, +1} for a point z Assume a logistic likelihood model for the labels p(x n θ) = exp( x nθ z n) The prior: θ Normal(µ, Σ) (Gaussian, not conjugate to the logistic) p(θ µ, Σ) exp( 1 2 (θ µ) Σ 1 (θ µ)) The posterior p(θ X) N n=1 p(x n θ)p(θ µ, Σ) does not have a closed form Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 12
25 A Non-Conjugate Case Want to learn a classifier θ for predicting label x { 1, +1} for a point z Assume a logistic likelihood model for the labels p(x n θ) = exp( x nθ z n) The prior: θ Normal(µ, Σ) (Gaussian, not conjugate to the logistic) p(θ µ, Σ) exp( 1 2 (θ µ) Σ 1 (θ µ)) The posterior p(θ X) N n=1 p(x n θ)p(θ µ, Σ) does not have a closed form Approximate Bayesian inference needed in such cases Sampling based approximations: MCMC methods Optimization based approximations: Variational Bayes, Laplace, etc. Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 12
26 Benefits of Bayesian Modeling Our estimate of θ is not a single value ( point ) but a distribution Can model and quantify the uncertainty (or variance ) in θ via p(θ X) Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 13
27 Benefits of Bayesian Modeling Our estimate of θ is not a single value ( point ) but a distribution Can model and quantify the uncertainty (or variance ) in θ via p(θ X) Can use the uncertainty in various tasks such as diagnosis, predictions Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 13
28 Benefits of Bayesian Modeling Our estimate of θ is not a single value ( point ) but a distribution Can model and quantify the uncertainty (or variance ) in θ via p(θ X) Can use the uncertainty in various tasks such as diagnosis, predictions, e.g., Making predictions by averaging over all possibile values of θ p(y x, X, Y ) = E p(θ.) [p(y x, θ)] p(y x, X, Y ) = p(y x, θ)p(θ X, Y )dθ Allows also quantifying the uncertainty in the predictions Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 13
29 Other Benefits of Bayesian Modeling Hierarchical model construction: parameters can depend on hyperparameters hyperparameters need not be tuned but can be inferred from data.. by maximizing the marginal likelihood p(x α) Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 14
30 Other Benefits of Bayesian Modeling Hierarchical model construction: parameters can depend on hyperparameters hyperparameters need not be tuned but can be inferred from data.. by maximizing the marginal likelihood p(x α) Provides robustness: E.g., learning the sparsity hyperparameter in sparse regression, learning kernel hyperparameters in kernel methods Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 14
31 Other Benefits of Probabilistic/Bayesian Modeling Can introduce local parameters (latent variables) associated with each data point and infer those as well Used in many problems: Gaussian mixture model, probabilistic principal component analysis, factor analysis, topic models Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 15
32 Other Benefits of Probabilistic/Bayesian Modeling Enables a modular architecture: Simple models can be neatly combined to solve more complex problems Allows jointly learning across multiple data sets (sometimes also known as multitask learning or transfer learning) Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 16
33 Other Benefits of Bayesian Modeling Nonparametric Bayesian modeling: a principled way to learn model size E.g., how many clusters (Gaussian mixture model or graph clustering), how many basis vectors (PCA) or dictionary elements (sparse coding or dictionary learning), how many topics (topic models such as LDA), etc.. NPBayes modeling allows the model size to grow with data Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 17
34 Other Benefits of Bayesian Modeling Sequential data acquisition or active learning Can check how confident the learned model is w.r.t. a new data point p(θ λ) = Normal(θ 0, λ 2 ) Prior p(y x, θ) = Normal(y θ x, σ 2 ) Likelihood p(θ Y, X) = Normal(θ µ θ, Σ θ ) Posterior p(y 0 x 0, Y, X) = Normal(y 0 µ 0, σ0) 2 Predictive dist. µ 0 = µ θ x 0 Predictive mean σ0 2 = σ 2 + x 0 Σ θ x 0 Predictive variance Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 18
35 Other Benefits of Bayesian Modeling Sequential data acquisition or active learning Can check how confident the learned model is w.r.t. a new data point p(θ λ) = Normal(θ 0, λ 2 ) Prior p(y x, θ) = Normal(y θ x, σ 2 ) Likelihood p(θ Y, X) = Normal(θ µ θ, Σ θ ) Posterior p(y 0 x 0, Y, X) = Normal(y 0 µ 0, σ0) 2 Predictive dist. µ 0 = µ θ x 0 Predictive mean σ0 2 = σ 2 + x 0 Σ θ x 0 Predictive variance Gives a strategy to choose data points sequentially for improved learning with a budget on the amount of data available Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 18
36 Next Talk Case study on Bayesian sparse linear regression Hyperparameter estimation Introduction to approximate Bayesian inference Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 19
37 Thanks! Questions? Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 20
Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationProbability and Estimation. Alan Moses
Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationIntroduction to Machine Learning
Introduction to Machine Learning Generative Models Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide
More informationNon-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationLinear Models A linear model is defined by the expression
Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose
More informationIntroduction to Machine Learning
Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationLearning Bayesian network : Given structure and completely observed data
Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution
More informationComputational Cognitive Science
Computational Cognitive Science Lecture 8: Frank Keller School of Informatics University of Edinburgh keller@inf.ed.ac.uk Based on slides by Sharon Goldwater October 14, 2016 Frank Keller Computational
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationMachine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation
Machine Learning CMPT 726 Simon Fraser University Binomial Parameter Estimation Outline Maximum Likelihood Estimation Smoothed Frequencies, Laplace Correction. Bayesian Approach. Conjugate Prior. Uniform
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationComputational Cognitive Science
Computational Cognitive Science Lecture 9: Bayesian Estimation Chris Lucas (Slides adapted from Frank Keller s) School of Informatics University of Edinburgh clucas2@inf.ed.ac.uk 17 October, 2017 1 / 28
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationTime Series and Dynamic Models
Time Series and Dynamic Models Section 1 Intro to Bayesian Inference Carlos M. Carvalho The University of Texas at Austin 1 Outline 1 1. Foundations of Bayesian Statistics 2. Bayesian Estimation 3. The
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationBayesian Methods: Naïve Bayes
Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationLecture 2: Priors and Conjugacy
Lecture 2: Priors and Conjugacy Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 6, 2014 Some nice courses Fred A. Hamprecht (Heidelberg U.) https://www.youtube.com/watch?v=j66rrnzzkow Michael I.
More informationStatistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart
Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart 1 Motivation and Problem In Lecture 1 we briefly saw how histograms
More informationEstimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio
Estimation of reliability parameters from Experimental data (Parte 2) This lecture Life test (t 1,t 2,...,t n ) Estimate θ of f T t θ For example: λ of f T (t)= λe - λt Classical approach (frequentist
More informationLecture 4: Probabilistic Learning
DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationSTAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01
STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationIntroduction to Bayesian Methods
Introduction to Bayesian Methods Jessi Cisewski Department of Statistics Yale University Sagan Summer Workshop 2016 Our goal: introduction to Bayesian methods Likelihoods Priors: conjugate priors, non-informative
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationModule 22: Bayesian Methods Lecture 9 A: Default prior selection
Module 22: Bayesian Methods Lecture 9 A: Default prior selection Peter Hoff Departments of Statistics and Biostatistics University of Washington Outline Jeffreys prior Unit information priors Empirical
More informationFundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner
Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationIntroduction to Machine Learning
Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Empirical Bayes, Hierarchical Bayes Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 5: Due April 10. Project description on Piazza. Final details coming
More informationOutline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution
Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model
More informationBayesian Analysis for Natural Language Processing Lecture 2
Bayesian Analysis for Natural Language Processing Lecture 2 Shay Cohen February 4, 2013 Administrativia The class has a mailing list: coms-e6998-11@cs.columbia.edu Need two volunteers for leading a discussion
More informationIntroduction to Machine Learning
Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationOverview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
Overview of Course So far, we have studied The concept of Bayesian network Independence and Separation in Bayesian networks Inference in Bayesian networks The rest of the course: Data analysis using Bayesian
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More informationBayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4
ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More information6.867 Machine Learning
6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.
More informationLecture 2: Conjugate priors
(Spring ʼ) Lecture : Conjugate priors Julia Hockenmaier juliahmr@illinois.edu Siebel Center http://www.cs.uiuc.edu/class/sp/cs98jhm The binomial distribution If p is the probability of heads, the probability
More informationCS540 Machine learning L9 Bayesian statistics
CS540 Machine learning L9 Bayesian statistics 1 Last time Naïve Bayes Beta-Bernoulli 2 Outline Bayesian concept learning Beta-Bernoulli model (review) Dirichlet-multinomial model Credible intervals 3 Bayesian
More informationBayesian Models in Machine Learning
Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning
More informationDS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling
DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling Due: Tuesday, May 10, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions below, including
More informationIntroduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models
Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Matthew S. Johnson New York ASA Chapter Workshop CUNY Graduate Center New York, NY hspace1in December 17, 2009 December
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample
More informationSTAT J535: Chapter 5: Classes of Bayesian Priors
STAT J535: Chapter 5: Classes of Bayesian Priors David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Spring 2012 The Bayesian Prior A prior distribution must be specified in a Bayesian analysis. The choice
More informationComputational Perception. Bayesian Inference
Computational Perception 15-485/785 January 24, 2008 Bayesian Inference The process of probabilistic inference 1. define model of problem 2. derive posterior distributions and estimators 3. estimate parameters
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationComments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert
Logistic regression Comments Mini-review and feedback These are equivalent: x > w = w > x Clarification: this course is about getting you to be able to think as a machine learning expert There has to be
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More information6.867 Machine Learning
6.867 Machine Learning Problem set 1 Due Thursday, September 19, in class What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationAccouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF
Accouncements You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF Please do not zip these files and submit (unless there are >5 files) 1 Bayesian Methods Machine Learning
More informationProbabilistic Graphical Models
Parameter Estimation December 14, 2015 Overview 1 Motivation 2 3 4 What did we have so far? 1 Representations: how do we model the problem? (directed/undirected). 2 Inference: given a model and partially
More informationIntroduction to Machine Learning. Lecture 2
Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 22, 2011 Today: MLE and MAP Bayes Classifiers Naïve Bayes Readings: Mitchell: Naïve Bayes and Logistic
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationProbabilistic Machine Learning. Industrial AI Lab.
Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear
More informationIntroduction to Bayesian inference
Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions
More informationBayesian Inference. p(y)
Bayesian Inference There are different ways to interpret a probability statement in a real world setting. Frequentist interpretations of probability apply to situations that can be repeated many times,
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationLecture 3. Linear Regression II Bastian Leibe RWTH Aachen
Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression
More informationThe Bayes classifier
The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More informationSome slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2
Logistics CSE 446: Point Estimation Winter 2012 PS2 out shortly Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Last Time Random variables, distributions Marginal, joint & conditional
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationKernel methods, kernel SVM and ridge regression
Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;
More informationBayesian RL Seminar. Chris Mansley September 9, 2008
Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationLecture 3. Univariate Bayesian inference: conjugate analysis
Summary Lecture 3. Univariate Bayesian inference: conjugate analysis 1. Posterior predictive distributions 2. Conjugate analysis for proportions 3. Posterior predictions for proportions 4. Conjugate analysis
More informationClustering bi-partite networks using collapsed latent block models
Clustering bi-partite networks using collapsed latent block models Jason Wyse, Nial Friel & Pierre Latouche Insight at UCD Laboratoire SAMM, Université Paris 1 Mail: jason.wyse@ucd.ie Insight Latent Space
More informationBayesian Methods. David S. Rosenberg. New York University. March 20, 2018
Bayesian Methods David S. Rosenberg New York University March 20, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 March 20, 2018 1 / 38 Contents 1 Classical Statistics 2 Bayesian
More informationCS-E3210 Machine Learning: Basic Principles
CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 61 Today s introduction
More informationBayesian Nonparametrics: Models Based on the Dirichlet Process
Bayesian Nonparametrics: Models Based on the Dirichlet Process Alessandro Panella Department of Computer Science University of Illinois at Chicago Machine Learning Seminar Series February 18, 2013 Alessandro
More informationLecture 4. Generative Models for Discrete Data - Part 3. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza.
Lecture 4 Generative Models for Discrete Data - Part 3 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza October 6, 2017 Luigi Freda ( La Sapienza University) Lecture 4 October 6, 2017 1 / 46 Outline
More informationBayesian Inference: Posterior Intervals
Bayesian Inference: Posterior Intervals Simple values like the posterior mean E[θ X] and posterior variance var[θ X] can be useful in learning about θ. Quantiles of π(θ X) (especially the posterior median)
More informationIntroduc)on to Bayesian Methods
Introduc)on to Bayesian Methods Bayes Rule py x)px) = px! y) = px y)py) py x) = px y)py) px) px) =! px! y) = px y)py) y py x) = py x) =! y "! y px y)py) px y)py) px y)py) px y)py)dy Bayes Rule py x) =
More informationGWAS IV: Bayesian linear (variance component) models
GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian
More informationIntroduction to Bayesian Learning. Machine Learning Fall 2018
Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability
More informationVariational Scoring of Graphical Model Structures
Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 5 Bayesian Learning of Bayesian Networks CS/CNS/EE 155 Andreas Krause Announcements Recitations: Every Tuesday 4-5:30 in 243 Annenberg Homework 1 out. Due in class
More information