CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning
|
|
- Moses Potter
- 5 years ago
- Views:
Transcription
1 CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning Professor Erik Sudderth Brown University Computer Science October 4, 2016 Some figures and materials courtesy David Barber, Bayesian Reasoning and Machine Learning
2 CS242: Lecture 4A Outline Ø MAP Estimation and Gaussian Priors Ø Graph Structure Learning via Sparse Regression
3 Degeneracies in ML Estimation ˆ = arg max LY `=1 p(x (`) ) = arg max LX log p(x (`) ) `=1 Ø The theory justifying maximum likelihood (ML) estimates is asymptotic: good properties as L becomes very large Ø But they can have poor properties with small datasets. Example: ML estimate of Bernoulli with no observed heads. ˆ = 1 L LX x (`) =0 `=1 ifx (`) = 0 for all ` assumes observing heads in future is impossible! Ø More generally, ML estimates can often give parameter estimates that are too extreme (too large or too small)
4 Bayesian Parameter Estimation Ø Suppose I have L independent observations sampled from some unknown probability distribution: x = {x (1),...,x (L) } Ø We have a likelihood model with unknown parameters: p(x ) = LY `=1 p(x (`) ) Ø We have a prior distribution on parameters (possible models): p( ) Ø Posterior distribution on parameters, given data: p( x) = 1 L p(x) p( ) Y p(x (`) ) `=1
5 Bayesian Parameter Estimation Ø Maximum a Posteriori (MAP) parameter estimate: Choose the parameters with largest posterior probability. ˆ = arg max p( x) = arg max p( ) Ø Posterior distribution on parameters, given data: p( x) = 1 L p(x) p( ) Y p(x (`) ) `=1 LY `=1 Ø Conditional Expectation parameter estimate: p(x (`) ) Set the parameters to the mean of the posterior distribution. ˆ = E[ x] = Z p( x) d
6 Priors for Discrete Exponential Families p(x ) =exp{ T (x) ( )} ( ) = log P X exp{ T (x)} For discrete variables, features are indicator functions: (x) 2{0, 1} d Assuming a finite number of variables, normalizable for any parameters: 2 R d = First derivatives of the log normalization constant are event probabilities: r ( ) =E [ (x)] = µ 2 [0, 1] d µ k =Pr [ k(x) = 1] What priors on exponential family parameters would favor simple discrete distributions? p( )
7 Example: Bernoulli Distribution Bernoulli Distribution: Single toss of a (possibly biased) coin Ber(x µ) =µ x (1 µ) 1 x x 2{0, 1} E[x µ] =P[x = 1] = µ 0 apple µ apple 1 Exponential Family Form: Ber(x ) =exp{ x ( )} 2 =R Logistic Function: 1 µ = 1+e µ = log 1 µ If =0thenµ =0.5. If 0 then µ 0.0. If 0 then µ 1.0.
8 Example: Pair of Binary Variables p(x) =exp{ 1 x x x 1 x 2 ( )} x i 2{0, 1} Represent arbitrary joint distribution on two bits with three statistics: µ 1 = E [x 1 ]=Pr [x 1 = 1] µ 2 = E [x 2 ]=Pr [x 2 = 1] µ 12 = E [x 1 x 2 ]=Pr [x 1 =1,x 2 = 1] Note that bits are independent when 12 =0: p(x) / exp{ 1 x 1 } exp{ 2 x 2 } / p(x 1 )p(x 2 ) Degree of dependence (positive or negative correlation) becomes large when 12 0 { } } conv{(0,0,0),(1,0,0),(0,1,0),(1,1,1)}
9 F V Factor Graphs & Exponential Families p(x) = 1 Z( ) Y f2f f (x f f ) set of hyperedges linking subsets of nodes set of N nodes or vertices, {1, 2,...,N} f V A factor graph is created from non-negative potentials: f (x f f )=exp T f f (x f ) f (x f ) 2 R d f By setting exponential family parameters to equal zero, we remove factors and simplify the graphical model! If f =0then f (x f f = 0) = 1 for all x f.
10 MAP Learning with Gaussian Priors p( ) =N ( 0, 1 I) Objective to Minimize: log p( ) = 2 T + constant f( ) = log p(x ) = P L `=1 log p(x(`) ) f( ) = log p(x ) log p( ) =f( )+ 2 T (plus constants) Gradient: r f( ) =r f( )+ Hessian: r 2 f( ) =r 2 f( )+ I Ø Including a Gaussian prior on parameters, or equivalently adding L 2 regularization, is a simple modification to the gradient vector & Hessian matrix for any model. Ø Biases exponential family towards weak factors, unless strong dependence justified by data
11 MAP Learning for Undirected Models Undirected graph encodes dependencies within a single training example: p(x ) = LY 1 Z( ) Y f (x (`) f f ) x = {x (1),...,x (L) } `=1 f2f Given L independent, identically distributed, completely observed samples: log p( x) =C + Take gradient with respect to parameters for a single factor: r f log p( x) = " L X `=1 " L X `=1 X f2f T f f (x (`) f ) f (x (`) f ) # # L ( ) LE [ f (x f )] 2 T f
12 MAP for Bernoulli with Gaussian Prior p( ) =N ( 0, 1 ) p(x ) =Ber(x ( )) Goal: Maximize log-posterior distribution Gradient: µ = ( ) = 1 1+e L( ) = s L ( ) 2 2, s = P L `=1 x(`) dl( ) d = s L ( ) (k+1) = (k) + rl( (k) )= (k) (k) +(s L ( (k) ))
13 What about Incomplete Data? Known: Fixed graph structure 82 and exponential 3 family 9 features: < p(x ) =exp 4 X = f T f (x : f ) 5 ( ) ; f2f x (`) i = variable at node i for observation ` Unknown: Numeric values of parameters Types of data used for parameter learning: Complete data: Incomplete (partial) data: N nodes (vars) x (`) i x (`) i x (`) i L observations
14 Reminder: EM for ML Learning X ln p(x ) =ln p(x, z ) z X p(x, z ) ln p(x ) q(z)ln q(z) Xz X ln p(x ) q(z)lnp(x, z ) q(z)lnq(z), L(q, ) z Initialization: Randomly select starting parameters (0) E-Step: Given parameters, infer posterior of hidden data q (t) = arg max L(q, (t 1) ) q M-Step: Given posterior distributions, learn parameters that fit data (t) = arg max L(q (t), ) Iteration: Alternate E-step & M-step until convergence z = p(z x, (t 1) ) = arg max X q (t) (z)lnp(x, z ) z
15 EM for MAP Learning Up to a constant independent of, ln p( x) = X ln p( )+lnp(x ) =lnp( )+ln p(x, z ) z ln p( )+ X X q(z)lnp(x, z ) q(z)lnq(z), L(q, ) z z Ø Initialization: Randomly select starting parameters (0) Ø E-Step: Given parameters, infer posterior of hidden data q (t) (z) =p(z x, (t 1) ) Ø M-Step: Given posterior distributions, learn parameters that fit data (t) = arg max log p( )+ X z q (t) (z)lnp(x, z ) Ø Iteration: Alternate E-step & M-step until convergence posterior distribution, exactly as in EM for ML weighted MAP learning
16 CS242: Lecture 4A Outline Ø MAP Estimation and Gaussian Priors Ø Graph Structure Learning via Sparse Regression
17 Learning Graphical Model Structures Over-Fitting: Maximum Likelihood always prefers fully-connected graphs Strategy 1: Place hard limit on graph s structural complexity Need to balance expressiveness, with learning & inference tractability Key example: Optimize over all (pairwise) tree-structure distributions Strategy 2: Define penalized likelihood which encourages simpler graphs Interpretable as assigning a prior on models, and finding posterior mode Classic approach: Search over graph structures Modern approach: Optimization with penalties that encourage sparsity Strategy 3: Bayesian model selection via marginal likelihoods of data Better in principle than simple penalties, but often intractable Revisit later in the course, once we ve developed more sophisticated algorithms for approximate learning and inference
18 Factor Graphs & Exponential Families p(x) = 1 Z( ) Y f2f f (x f f ) Graph structure selection is feature selection! Want to determine which factors (features) should be used, and which should be discarded (assigned zero weight). A factor graph is created from non-negative potentials: f (x f f )=exp T f f (x f ) f (x f ) 2 R d f By setting exponential family parameters to equal zero, we remove factors and simplify the graphical model! If f =0then f (x f f = 0) = 1 for all x f.
19 Laplace Distribution Probability Densities Gauss Laplace Lap( 0, )= 2 exp( ) Gauss Laplace Log Probability Densities When used as a zero-mean prior on vectors of model parameters: Ø Compared to Gaussian, stronger bias that many near zero Ø When find MAP estimate, some weights are exactly zero Ø Learning harder than for Gaussian, but still convex
20 Laplacian prior L 1 regularization Lasso regression Constrained Optimization Gaussian prior L 2 regularization Ridge regression p(w) = DY Lap(w j 0, ) j=1 p(w) = DY Norm(w j 0, j=1 2 ) f(w) = y w w 1 Where do level sets of the quadratic regression cost function first intersect the constraint set? f(w) = y w w 2 2
21 Gradient-Based Optimization Laplacian prior L 1 regularization Lasso regression Gaussian prior L 2 regularization Ridge regression p( ) = DY Lap( j 0, ) p( ) = j=1 DY Norm( j 0, j=1 2 ) Objective Function: Negative Gradient: f( ) = f 0 ( ) log p( ) (Informal intuition: Gradient of L 1 objective not defined at zero)
22 Generalized Norms: Bridge Regression log p(x )+ P j j b b ExpPower( µ, a, b) = 2a (1/b) exp x µ a b b=2 b=1 b=0.3 Ø Convex objective function (true norm): b 1 Ø Encourages sparse solutions (cusp at zero): b 1 Ø Lasso/Laplacian (convex & sparsifying): b = 1 Ø Ridge/Gaussian (classical, closed form solutions): b = 2 Ø Sparsity via discrete feature count (greedy search): b è 0
23 Bayesian Regression: 0 Observations Posterior Data Space p(y x, w) =N (y w 0 + w 1 x, 2 ) Prior
24 Bayesian Regression: 1 Observations Likelihood Posterior Data Space p(y x, w) =N (y w 0 + w 1 x, 2 ) Prior
25 Regression Posteriors with Sparse Priors log p(x )+ P j j b b ExpPower( µ, a, b) = 2a (1/b) exp x µ a b Priors Posteriors b=2 b=1 b=0.4
26 lcavol lweight age lbph svi lcp gleason pgg45 Ridge: Bound on L 2 norm Regularization Paths Prostate Cancer Dataset with N=67, D= Lasso: Bound on L 1 norm lcavol lweight age lbph svi lcp gleason pgg Ø Horizontal axis increases bound on weights (less regularization) Ø For each bound, plot values of estimated feature weights Ø Vertical lines are models chosen by cross-validation
27 Optimization: Projected Gradient Ø Generic method based on gradient & projection operators: Ø Projection onto non-negativity constraint is easy: k = w k Ø Guaranteed to converge to global minimum of any convex function on a convex set Ø Extensions modify descent directions for faster convergence
28 Sparse Learning for Undirected Models log p(x ) = log p( x) =C LX `=1 X f2f apple L X `=1 T f X f2f f (x (`) f ) L ( ) T f f (x (`) f ) L ( ) + 1 Standard software packages for L 1 -regularized learning assume we can evaluate objective function and its gradient in closed form. This is possible assuming inference tractable in model with all features. Pseudo-likelihood estimators & variational estimators approximate true likelihood, but can scale to features where exact inference intractable. Can replace L 1 with fancier penalties to encourage blocks of parameters to simultaneously be set to zero.
29 Example: Word Usage in Newsgroups L 1 regularization ion ( = 256, isolated nodes are not plotted). case children bible course christian computer evidence health insurance disk display card fact earth files graphics government god dos format help data image video gun human car president israel jesus drive memory number power law engine dealer jews baseball ftp mac scsi problem rights war religion games fans pc program phone nasa state question hockey software research shuttle league nhl launch moon science orbit players space university world season Schmidt 2010 PhD Thesis: Presence of 100 words across version 16,242 postings to 20 newsgroups. windows system technology driver win won team
30 Example: Word Usage in Newsgroups L 1 regularization ( = 512, isolated nodes are not plotted). baseball players games hockey season bible case israel christian evidence course jews team earth fact children win god number government question jesus religion war human president law world rights phone files computer help disk state research science nasa program software problem data drive Schmidt 2010 PhD Thesis: Presence of 100 words across 16,242 postings to 20 newsgroups. university version dos space windows system card video scsi
CS242: Probabilistic Graphical Models Lecture 4B: Learning Tree-Structured and Directed Graphs
CS242: Probabilistic Graphical Models Lecture 4B: Learning Tree-Structured and Directed Graphs Professor Erik Sudderth Brown University Computer Science October 6, 2016 Some figures and materials courtesy
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning and Inference in DAGs We discussed learning in DAG models, log p(x W ) = n d log p(x i j x i pa(j),
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 9: Expectation Maximiation (EM) Algorithm, Learning in Undirected Graphical Models Some figures courtesy
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationThe Generative Self-Organizing Map. A Probabilistic Generalization of Kohonen s SOM
Submitted to: European Symposium on Artificial Neural Networks 2003 Universiteit van Amsterdam IAS technical report IAS-UVA-02-03 The Generative Self-Organizing Map A Probabilistic Generalization of Kohonen
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s
More informationLecture 2: Parameter Learning in Fully Observed Graphical Models. Sam Roweis. Monday July 24, 2006 Machine Learning Summer School, Taiwan
Lecture 2: Parameter Learning in Fully Observed Graphical Models Sam Roweis Monday July 24, 2006 Machine Learning Summer School, Taiwan Likelihood Functions So far we have focused on the (log) probability
More informationSelf-Organization by Optimizing Free-Energy
Self-Organization by Optimizing Free-Energy J.J. Verbeek, N. Vlassis, B.J.A. Kröse University of Amsterdam, Informatics Institute Kruislaan 403, 1098 SJ Amsterdam, The Netherlands Abstract. We present
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. October 7, Efficiency: If size(w) = 100B, each prediction is expensive:
Simple Variable Selection LASSO: Sparse Regression Machine Learning CSE546 Carlos Guestrin University of Washington October 7, 2013 1 Sparsity Vector w is sparse, if many entries are zero: Very useful
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationCOMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)
COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationBAYESIAN MACHINE LEARNING.
BAYESIAN MACHINE LEARNING frederic.pennerath@centralesupelec.fr What is this Bayesian Machine Learning course about? A course emphasizing the few essential theoretical ingredients Probabilistic generative
More informationCOMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017
COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University UNDERDETERMINED LINEAR EQUATIONS We
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationCSC2515 Assignment #2
CSC2515 Assignment #2 Due: Nov.4, 2pm at the START of class Worth: 18% Late assignments not accepted. 1 Pseudo-Bayesian Linear Regression (3%) In this question you will dabble in Bayesian statistics and
More informationPattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM
Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures
More informationIntroduction to Machine Learning
Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationThe connection of dropout and Bayesian statistics
The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationOutline Lecture 2 2(32)
Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic
More informationLecture 3. Linear Regression II Bastian Leibe RWTH Aachen
Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationMachine Learning - MT & 5. Basis Expansion, Regularization, Validation
Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationMachine Learning for Signal Processing Bayes Classification
Machine Learning for Signal Processing Bayes Classification Class 16. 24 Oct 2017 Instructor: Bhiksha Raj - Abelino Jimenez 11755/18797 1 Recap: KNN A very effective and simple way of performing classification
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationBehavioral Data Mining. Lecture 7 Linear and Logistic Regression
Behavioral Data Mining Lecture 7 Linear and Logistic Regression Outline Linear Regression Regularization Logistic Regression Stochastic Gradient Fast Stochastic Methods Performance tips Linear Regression
More informationBasic Sampling Methods
Basic Sampling Methods Sargur Srihari srihari@cedar.buffalo.edu 1 1. Motivation Topics Intractability in ML How sampling can help 2. Ancestral Sampling Using BNs 3. Transforming a Uniform Distribution
More informationRegularization Paths
December 2005 Trevor Hastie, Stanford Statistics 1 Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Saharon Rosset, Ji Zhu, Hui Zhou, Rob Tibshirani and
More informationLecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models
Advanced Machine Learning Lecture 10 Mixture Models II 30.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ Announcement Exercise sheet 2 online Sampling Rejection Sampling Importance
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationChapter 3. Linear Models for Regression
Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear
More informationActive and Semi-supervised Kernel Classification
Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 305 Part VII
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:
More informationLarge-Scale Feature Learning with Spike-and-Slab Sparse Coding
Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Empirical Bayes, Hierarchical Bayes Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 5: Due April 10. Project description on Piazza. Final details coming
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationLecture 2: Simple Classifiers
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 2: Simple Classifiers Slides based on Rich Zemel s All lecture slides will be available on the course website: www.cs.toronto.edu/~jessebett/csc412
More informationIs the test error unbiased for these programs?
Is the test error unbiased for these programs? Xtrain avg N o Preprocessing by de meaning using whole TEST set 2017 Kevin Jamieson 1 Is the test error unbiased for this program? e Stott see non for f x
More informationLinear Models for Regression
Linear Models for Regression Machine Learning Torsten Möller Möller/Mori 1 Reading Chapter 3 of Pattern Recognition and Machine Learning by Bishop Chapter 3+5+6+7 of The Elements of Statistical Learning
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10
COS53: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 0 MELISSA CARROLL, LINJIE LUO. BIAS-VARIANCE TRADE-OFF (CONTINUED FROM LAST LECTURE) If V = (X n, Y n )} are observed data, the linear regression problem
More informationProbability and Information Theory. Sargur N. Srihari
Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationThe lasso: some novel algorithms and applications
1 The lasso: some novel algorithms and applications Newton Institute, June 25, 2008 Robert Tibshirani Stanford University Collaborations with Trevor Hastie, Jerome Friedman, Holger Hoefling, Gen Nowak,
More informationAlgorithmisches Lernen/Machine Learning
Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide
More informationClustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26
Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar
More informationIntroduction to Machine Learning. Regression. Computer Science, Tel-Aviv University,
1 Introduction to Machine Learning Regression Computer Science, Tel-Aviv University, 2013-14 Classification Input: X Real valued, vectors over real. Discrete values (0,1,2,...) Other structures (e.g.,
More informationMidterm exam CS 189/289, Fall 2015
Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points
More informationGaussian discriminant analysis Naive Bayes
DM825 Introduction to Machine Learning Lecture 7 Gaussian discriminant analysis Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. is 2. Multi-variate
More informationCSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection
CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection (non-examinable material) Matthew J. Beal February 27, 2004 www.variational-bayes.org Bayesian Model Selection
More informationECE521 lecture 4: 19 January Optimization, MLE, regularization
ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the
More informationClustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning
Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 5, 06 Reading: See class website Eric Xing @ CMU, 005-06
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationRelevance Vector Machines
LUT February 21, 2011 Support Vector Machines Model / Regression Marginal Likelihood Regression Relevance vector machines Exercise Support Vector Machines The relevance vector machine (RVM) is a bayesian
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationLinear Regression. Aarti Singh. Machine Learning / Sept 27, 2010
Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationToday. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use?
Today Statistical Learning Parameter Estimation: Maximum Likelihood (ML) Maximum A Posteriori (MAP) Bayesian Continuous case Learning Parameters for a Bayesian Network Naive Bayes Maximum Likelihood estimates
More informationLearning Parameters of Undirected Models. Sargur Srihari
Learning Parameters of Undirected Models Sargur srihari@cedar.buffalo.edu 1 Topics Log-linear Parameterization Likelihood Function Maximum Likelihood Parameter Estimation Simple and Conjugate Gradient
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationWhy should you care about the solution strategies?
Optimization Why should you care about the solution strategies? Understanding the optimization approaches behind the algorithms makes you more effectively choose which algorithm to run Understanding the
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More information