11. Learning graphical models
|
|
- Nicholas Clarke
- 5 years ago
- Views:
Transcription
1 Learning graphical models Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models
2 Learning graphical models 11-2 statistical inference addresses how to overcome computational challenges given a graphical model learning addresses how to overcome both computational and statistical challenges to learn graphical models from data the ultimate goal of learning could be (a) to uncover structure; (b) solve particular problem such as estimation, classification, or decision; or (c) learn the joint distribution in order to make predictions and the right learning algorithm depends on the particular goal
3 Learning graphical models 11-3 choosing the right model to learn is crucial, in the fundamental tradeoff between overeating and underfitting, often called bias-variance tradeoff consider the task of learning a function over x 2 R d where Y = f (x ) + Z with i.i.d noise Z the goal is to predict y from x by learning the function f, from training data f(x 1 ; y 1 ); : : : ; (x n ; y n )g bias-variance tradeoff for Err(x ) for a given x is E[(Y ˆf Y n 1 (x )) 2 ] = (E[ˆf Y n 1 (x )] f (x )) 2 + E[(ˆf Y n 1 (x ) E[ˆf Y n 1 (x )]) 2 ] {z } Bias {z } Variance + E[(Y f (x )) 2 ] {z } Irreducible error
4 Learning graphical models 11-4 when the model is too complex, the model overfits the training data and does not generalize to new data, which is referred to as high variance when the model is too simple, the model undercuts the data and has high bias common strategy to combat this is to (a) constrain the class of models or (b) penalize the model complexity graphical models provide a hierarchy of model complexity to avoid overfitting/underfitting in increasing level of difficulty, learning tasks can be classified as follows we know the graph, or we impose a particular graph structure, and we need to learn the parameters we observe all the variables, and we need to infer the structure of the graph and the parameters the variables are partially observation, and we need to infer the structure
5 Maximum likelihood (x ) = 1 Z Y a2f we want to learn the graphical model consisting of p nodes from n i.i.d. samples x (1) ; : : : ; x (n) 2 X p it is more convenient to parametrize the compatibility functions as e a ) = and one approach is to use maximum likelihood estimator L n (; G; fx (`) g`2[n]) = 1 Ỳ n n log (x (`) ) =1 = 1 n X X `2[n] a2f a (x ) log Z (; G) this is a strictly concave function over = [ a 2 R jx a2f jj@aj, however, evaluating this function requires computing the log partition function, which is in general #p-hard, even if G is given if inference is efficient then learning is efficient Learning graphical models 11-5 P
6 Learning graphical models 11-6 In case of Ising models, (x ) = 1 Z Y e ij x i x j Y (i ;j )2E i2v e i x i the log likelihood function is X n L n(; G; fx (`) 1 nx g`2[n] ) = ij n o X n 1 n x (`) i x (`) j + i (i ;j )2E `=1 i2v `=1 {z } h ˆM;i nx o x (`) i log Z (; G) {z } () ˆM = [ ˆM ij ; : : : ; ˆM i ; : : :] 2 R jej+p where ˆM ij = 1 n P` x (`) i x (`) j and ˆM i = 1 n P` x (`) i this is a strictly concave function over = [ij ; : : : ; i ; : : :] 2 R jej+p
7 Learning graphical models 11-7 this maximum likelihood estimator is consistent, i.e. let ˆ = arg max L n (; G; fx (`) g), and let denote the true parameter, then Proof. by optimality of ˆ, lim ˆ = n!1 h ˆM; ˆi (ˆ) h ˆM; i ( ) by strong convexity of ( ), there exists > 0 such that (ˆ) ( ) + hm; (ˆ )i + 2 kˆ k 2 which follows form the fact that r ( ) = M, where M ij = E [x i x j ] and M i = E [x i ], and together we get h ˆM; (ˆ )i (ˆ) ( ) hm; (ˆ )i + 2 kˆ k 2 it follows that k ˆM Mk kˆ k h( ˆM M); (ˆ )i 2 kˆ k 2
8 Learning graphical models 11-8 kˆ k 2 k ˆM we have which gives, with high probability, Mk 1 je j + p kˆ k n where = inf min (r 2 ()) computing the gradient of is straight forward: () = log ij = 1 Z (; G) Y (i ;j )2E X x Y ij xi xj e Y (i ;j )2E i2v Y ij xi xj e i xi e i2v e i xi x i x j = E [x i x j ] = M ij again, this is computationally challenging, in general, but if such inference is efficient, then learning can also be efficient
9 Learning graphical models 11-9 recall that Let then the log-likelihood is (x ) = 1 Z c(x ) = nx i=1 = L( ; x (1) ; : : : ; x (n) ) = 1 n = Y a 2F I(x (i) = x ) ; and nx i=1 I(x = x@a) X X i2[n] a2f XX n a2f {z } log( a(x )) ) log( a(x )) log Z log Z
10 Learning graphical models taking the ; x (1) ; : : : ; x (n) = ˆ(x@a) if the graph is a tree, then one can find the ML estimate using the above equation in general, it is computationally challenging, and one resorts to approximate solutions iterative proportional fitting (IPF) updates the estimates iteratively using the above fixed point equation: (t+1) a (t) a (x (t) although is difficult to compute, IPF has a nice property that the partition function Z (t) in ˆ (t) = 1 (t) Z Qa2F (t) a is independent of t. So we can start with a distribution with known Z (0).
11 Parameter learning suppose G is given, and want to estimate the compatibility functions f a2f from i.i.d. samples x (1) ; : : : ; x (n) 2 X p drawn from (x ) = 1 Z Y a2f Example. a single node with x 2 f0; 1g, and let (x ) = for x = 1 1 for x = 0 P consider an estimator ˆ = 1 ǹ n =1 x (`), Claim. For any "; > 0, let n ("; ) denote the minimum number of samples required to ensure that P(jˆ j > "), then n ("; ) 1 2" 2 log 2 this follows directly from Hoeffding;s inequality: P j 1 n nx `=1 x (`) j > " 2 expf 2n" 2 g but it does not depend on the value of Learning graphical models 11-11
12 Learning graphical models often we want " to be in the same range as, and letting " = gives n (; ) 1 but applying Chernoff s bound: 1 nx P x (`) > n 1 P n `=1 nx `=1 x (`) < log 2 expf D KL ((1 + )k) ng ; and expf D KL ((1 )k) ng and using the fact that D KL ((1 + )k) (1=4) 2, for jj < 1=2, this can be tightened to n (; ) 4 2 log 2
13 Learning graphical models Example. consider now a joint distribution (x ) (satisfying (x ) > 0 for all x ) over x 2 f0; 1g P p with a large p, and an estimator ˆ(x ) = 1 ǹ n =1 I(x (`) = x ) it follows that the minimum number of samples required to ensure that P(9x such that jˆ(x ) (x )j > (x )) is n 4 (; ) 2 min x f(x )g log 2n+1 which follows from the Chernoff s bound with union bound over x 2 f0; 1g n since minx f(x )g 2 p, the sample complexity is exponential in the dimension p this is unavoidable in general, but when ( ) factorizes according to a graphical model with a sparse graph, then the sample complexity significantly decreases
14 Learning graphical models we want to learn the compatibility functions of a graphical model using local data, but compatibility functions are not uniquely defined, (x ) = 1 Z Y a2f we first need to define canonical form of compatibility functions recall that a positive joint distribution that satisfies all conditional independencies according to a graph G has a factorization according to the graph G, which is formally shown in the following Hammersely-Clifford Theorem. Suppose a (x ) > 0 satisfies all independencies implied by a graphical model G(V ; F ; E), then (x ) = (0) Y a2f where Y U 2@a = Y U 2@a (x U ; 0 V nu ) ( 1)j@anU j (xu ; 0 V nu ) ( 1) j@anu j (0 U ; 0 V nu )
15 Learning graphical models to estimate each term from samples, notice that (x U ; 0 V nu ) (0 U ; 0 V nu ) = (x U nu ) (0 U nu ) if the degree is bounded by k, then this involves only k 3 variable nodes
16 Learning graphical models Structural learning consider p random variables x1 ; : : : ; x p represented as a graph with p nodes p the number of edges is 2 = p(p 1)=2, and each is either present or not, resulting in 2 p(p 1)=2 possible graphs given n i.i.d. observations x (1) ; : : : ; x (n) 2 X p, (assuming there is no hidden nodes and all variables are observed), we want to select the best graph among 2 p(p 1)=2, i.e. how do we give scores to each model (i.e. graph) given data? how do we find the model with the highest score?
17 Learning graphical models in statistics there are two schools of thoughts: frequentist and Bayesian frequentist assume the parameter of interest are deterministic but unknown, in particular there is no distribution over maximum likelihood (ML) estimation finds that maximizes the score of a model parameter, with the score being the log likelihood log P (x (1) ; : : : ; x (n) ) Baysians assume there exists a distribution over the parameter of interest () maximum a posteriori (MAP) estimation finds maximizing the posterior distribution evaluated on the given data, P(jx (1) ; : : : ; x (n) ) Frequentist s approach to graph learning arg max G arg max G 1 n nx log G;G (x (i) ) i=1{z } L(G;x (1) ;:::;x (n) )
18 Learning graphical models likelihood scores for graphical models are closely related to mutual information and entropy recall that I (X ; Y ) H (X ) H (Y jx ) X x ;y X x X x ;y (x ; y) log (x ) log (x ) (x ; y) (x )(y) ; (x ; y) log (yjx ) = I (X ; Y ) + H (Y ) :
19 Learning graphical models consider a directed acyclic graphical model (DAG) where (i) G (x ) = py i=1 (i) (x i jx G (i)) represents all the parameters required to describe the conditional distribution (e.g. the entries of the table describing the conditional distribution) in computing L(G; x (1) ; : : : ; x (n) ), arg max G (x i jx (i)) = [ (i) G ] x i ;x (i) arg max G 1 n nx log G;G (x (i) ) i=1{z } L(G;x (1) ;:::;x (n) ) the ML estimate ˆ G for given data is the empirical distribution, i.e. [ˆ (i) G ] x i ;x pi(i) = ˆ(x i jx {z (i)) } denotes empirical distribution = ˆ(x i ; x (i)) ˆ(x (i))
20 Learning graphical models it follows that the log likelihood is where together, it gives L(G; x (1) ; : : : ; x (n) ) = px i=1 L i(g; x (1) ; : : : ; x (n) ) = 1 n L(G; x (1) ; : : : ; x (n) ) = = = X L i(g; x (1) ; : : : ; x (n) ) X `2[n] log ˆ(x (`) i jx (`) (i) ) x i ;x (i) ˆ(x i ; x (i)) log ˆ(x ijx (i)) = H ( ˆX ij ˆX (i)) = I ( ˆX i; ˆX (i)) H ( ˆX i) px px i=1 i=1 L(G; x (1) ; : : : ; x (n) ) I ( ˆX i; ˆX (i)) px H ( ˆX i) i=1 {z } independent of G
21 Learning graphical models given a graph G, this gives the likelihood, and all graphs are subtracted the same node entropies, and we only need to compare the sum of mutual information terms per graph Chow-Liu 1968 addressed this for graphs restricted to trees, where each node only has one parent. In this case, the pairwise mutual p information can be computed for all 2 pairs and ML graphs amounts to being the maximum spanning tree on this weighted complete graph. This can be done efficiently using for example Kruskal s algorithm, where you sort the edges according to the mutual information, and you add edges iteratively starting from the largest one, adding an edge each time if it does not introduce a cycle. One needs to be careful with the direction in order to avoid a node having two parents.
22 Learning graphical models the ML score always favors more complex models, i.e. by adding more edges one can easily increase the likelihood (this follow from the fact that complex models include simpler models as special cases) without a penalty on complexity or a restriction to a class of models, ML will end up giving a complete graph one way to address this issue is to introduce prior, which we do not address in this lecture
23 Parameter estimation from partial observations Learning graphical models 11-23
Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More information3 : Representation of Undirected GM
10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:
More information13 : Variational Inference: Loopy Belief Propagation and Mean Field
10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationLecture 3: Introduction to Complexity Regularization
ECE90 Spring 2007 Statistical Learning Theory Instructor: R. Nowak Lecture 3: Introduction to Complexity Regularization We ended the previous lecture with a brief discussion of overfitting. Recall that,
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3
More informationReminders. Thought questions should be submitted on eclass. Please list the section related to the thought question
Linear regression Reminders Thought questions should be submitted on eclass Please list the section related to the thought question If it is a more general, open-ended question not exactly related to a
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationLecture 4 October 18th
Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations
More informationLecture 6. Regression
Lecture 6. Regression Prof. Alan Yuille Summer 2014 Outline 1. Introduction to Regression 2. Binary Regression 3. Linear Regression; Polynomial Regression 4. Non-linear Regression; Multilayer Perceptron
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationMachine Learning Basics: Maximum Likelihood Estimation
Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning
More informationThe connection of dropout and Bayesian statistics
The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University
More information10708 Graphical Models: Homework 2
10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood
More informationDirected and Undirected Graphical Models
Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationProbabilistic Graphical Models
Probabilistic Graphical Models David Sontag New York University Lecture 4, February 16, 2012 David Sontag (NYU) Graphical Models Lecture 4, February 16, 2012 1 / 27 Undirected graphical models Reminder
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationVariational Inference (11/04/13)
STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationLearning Bayesian networks
1 Lecture topics: Learning Bayesian networks from data maximum likelihood, BIC Bayesian, marginal likelihood Learning Bayesian networks There are two problems we have to solve in order to estimate Bayesian
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 4 Learning Bayesian Networks CS/CNS/EE 155 Andreas Krause Announcements Another TA: Hongchao Zhou Please fill out the questionnaire about recitations Homework 1 out.
More informationUnderstanding Generalization Error: Bounds and Decompositions
CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationMathematical Formulation of Our Example
Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot
More informationProbabilistic Graphical Models
2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector
More informationLogistic Regression: Online, Lazy, Kernelized, Sequential, etc.
Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:
More informationPart I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS
Part I C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Probabilistic Graphical Models Graphical representation of a probabilistic model Each variable corresponds to a
More informationDecision Tree Learning Lecture 2
Machine Learning Coms-4771 Decision Tree Learning Lecture 2 January 28, 2008 Two Types of Supervised Learning Problems (recap) Feature (input) space X, label (output) space Y. Unknown distribution D over
More informationLecture 4: Probabilistic Learning
DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationStructure Learning: the good, the bad, the ugly
Readings: K&F: 15.1, 15.2, 15.3, 15.4, 15.5 Structure Learning: the good, the bad, the ugly Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 29 th, 2006 1 Understanding the uniform
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More information5. Density evolution. Density evolution 5-1
5. Density evolution Density evolution 5-1 Probabilistic analysis of message passing algorithms variable nodes factor nodes x1 a x i x2 a(x i ; x j ; x k ) x3 b x4 consider factor graph model G = (V ;
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationCS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1
More informationEmpirical Risk Minimization
Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space
More informationp(d θ ) l(θ ) 1.2 x x x
p(d θ ).2 x 0-7 0.8 x 0-7 0.4 x 0-7 l(θ ) -20-40 -60-80 -00 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ θ x FIGURE 3.. The top graph shows several training points in one dimension, known or assumed to
More informationMODULE -4 BAYEIAN LEARNING
MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities
More informationCS 195-5: Machine Learning Problem Set 1
CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationLecture 9: PGM Learning
13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationReview: Directed Models (Bayes Nets)
X Review: Directed Models (Bayes Nets) Lecture 3: Undirected Graphical Models Sam Roweis January 2, 24 Semantics: x y z if z d-separates x and y d-separation: z d-separates x from y if along every undirected
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationThe Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017
The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the
More informationNotes on Machine Learning for and
Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationUndirected graphical models
Undirected graphical models Semantics of probabilistic models over undirected graphs Parameters of undirected models Example applications COMP-652 and ECSE-608, February 16, 2017 1 Undirected graphical
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationDoes Better Inference mean Better Learning?
Does Better Inference mean Better Learning? Andrew E. Gelfand, Rina Dechter & Alexander Ihler Department of Computer Science University of California, Irvine {agelfand,dechter,ihler}@ics.uci.edu Abstract
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationLearning Multiple Tasks with a Sparse Matrix-Normal Penalty
Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1
More informationProbabilistic Graphical Models (I)
Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random
More informationLecture 4: State Estimation in Hidden Markov Models (cont.)
EE378A Statistical Signal Processing Lecture 4-04/13/2017 Lecture 4: State Estimation in Hidden Markov Models (cont.) Lecturer: Tsachy Weissman Scribe: David Wugofski In this lecture we build on previous
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationBias-Variance Tradeoff
What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More information4.1 Notation and probability review
Directed and undirected graphical models Fall 2015 Lecture 4 October 21st Lecturer: Simon Lacoste-Julien Scribe: Jaime Roquero, JieYing Wu 4.1 Notation and probability review 4.1.1 Notations Let us recall
More informationConditional Random Field
Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions
More informationIntelligent Systems Discriminative Learning, Neural Networks
Intelligent Systems Discriminative Learning, Neural Networks Carsten Rother, Dmitrij Schlesinger WS2014/2015, Outline 1. Discriminative learning 2. Neurons and linear classifiers: 1) Perceptron-Algorithm
More informationCSC 412 (Lecture 4): Undirected Graphical Models
CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationIntroduction to Probabilistic Graphical Models
Introduction to Probabilistic Graphical Models Sargur Srihari srihari@cedar.buffalo.edu 1 Topics 1. What are probabilistic graphical models (PGMs) 2. Use of PGMs Engineering and AI 3. Directionality in
More informationMachine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 20, 2012 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)
More informationComments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms
Neural networks Comments Assignment 3 code released implement classification algorithms use kernels for census dataset Thought questions 3 due this week Mini-project: hopefully you have started 2 Example:
More informationIntroduction to Machine Learning
Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide
More informationSymbolic Variable Elimination in Discrete and Continuous Graphical Models. Scott Sanner Ehsan Abbasnejad
Symbolic Variable Elimination in Discrete and Continuous Graphical Models Scott Sanner Ehsan Abbasnejad Inference for Dynamic Tracking No one previously did this inference exactly in closed-form! Exact
More informationBayesian Learning. Bayesian Learning Criteria
Bayesian Learning In Bayesian learning, we are interested in the probability of a hypothesis h given the dataset D. By Bayes theorem: P (h D) = P (D h)p (h) P (D) Other useful formulas to remember are:
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, etworks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationLecture 2: Simple Classifiers
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 2: Simple Classifiers Slides based on Rich Zemel s All lecture slides will be available on the course website: www.cs.toronto.edu/~jessebett/csc412
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationLecture 3: Pattern Classification
EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More information4: Parameter Estimation in Fully Observed BNs
10-708: Probabilistic Graphical Models 10-708, Spring 2015 4: Parameter Estimation in Fully Observed BNs Lecturer: Eric P. Xing Scribes: Purvasha Charavarti, Natalie Klein, Dipan Pal 1 Learning Graphical
More informationBayesian Networks. Motivation
Bayesian Networks Computer Sciences 760 Spring 2014 http://pages.cs.wisc.edu/~dpage/cs760/ Motivation Assume we have five Boolean variables,,,, The joint probability is,,,, How many state configurations
More informationRobust Monte Carlo Methods for Sequential Planning and Decision Making
Robust Monte Carlo Methods for Sequential Planning and Decision Making Sue Zheng, Jason Pacheco, & John Fisher Sensing, Learning, & Inference Group Computer Science & Artificial Intelligence Laboratory
More informationToday. Calculus. Linear Regression. Lagrange Multipliers
Today Calculus Lagrange Multipliers Linear Regression 1 Optimization with constraints What if I want to constrain the parameters of the model. The mean is less than 10 Find the best likelihood, subject
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 6: Model complexity scores (v3) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 34 Estimating prediction error 2 / 34 Estimating prediction error We saw how we can estimate
More informationJunction Tree, BP and Variational Methods
Junction Tree, BP and Variational Methods Adrian Weller MLSALT4 Lecture Feb 21, 2018 With thanks to David Sontag (MIT) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,
More informationCS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning
CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning Professor Erik Sudderth Brown University Computer Science October 4, 2016 Some figures and materials courtesy
More informationMGMT 69000: Topics in High-dimensional Data Analysis Falll 2016
MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 Lecture 14: Information Theoretic Methods Lecturer: Jiaming Xu Scribe: Hilda Ibriga, Adarsh Barik, December 02, 2016 Outline f-divergence
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationProbabilistic Graphical Models & Applications
Probabilistic Graphical Models & Applications Learning of Graphical Models Bjoern Andres and Bernt Schiele Max Planck Institute for Informatics The slides of today s lecture are authored by and shown with
More information