Introduction to Graphical Models
|
|
- Ferdinand Charles
- 6 years ago
- Views:
Transcription
1 Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang (Tue.) Yung-Kyun Noh
2 GENERALIZATION FOR PREDICTION 2
3 Probabilistic Assumption and Bayes Classification Bayes Error p 1 (x) p 2 (x) Data space 3
4 Probabilistic Assumption and Bayes Classification p 1 (x) p 2 (x) Data space 4
5 Learning Data (Regularity) Prediction Learning Learn prediction function from data ( : Hypothesis set/candidate set) 5
6 Quantify the Evaluation Measure of quality: expected loss : loss function Estimated error Examples Classification Regression 6
7 Consistent Learner satisfies <Uniform convergence> Caution: The definition of consistency is not 7
8 Quiz 1 Consider a hypothesis set which satisfies Explain that learning with is not consistent though it satisfies. N What is the possible problem in this case? 8
9 Consistency and Bayes Error Minimizing expected error (objective) vs. minimizing estimated error f Error : loss function : Optimal parameter with N data 0 Bayes error Smaller (Simple classifier) Error for training data (train error) (For example, a linear classifier with regularization) 9 Larger (Complex classifier)
10 Consistency and Bayes Error Consistent learner with many data Error 0 train error Bayes error Smaller f : Optimal parameter Minimum error of consistent learner Larger Consistency does not necessarily imply achieving the Bayes error!!! (For example, a linear classifier with regularization) 10
11 Related Terms with Confining H Linear model VC-dim for classification = Dimensionality + 1 Small number of parameters Large margin Regularization Bias-Variance trade-off Generalization ability, overfitting Many terms are theoretically connected 11
12 Supervised Learning (Prediction) Method: Learning from examples and can classify an unseen data horse car car horse ship horse Learn car plane New unseen data classifier label (a horse) [Based on the assumption of regularity] 12
13 Representation of Data Data space Each datum is one point in a data space =[1, 2, 5, 10, ] T elements 13
14 Classification 14
15 Bayes Optimal Classifier Our ultimate goal is not a zero error. (Optimal) Bayes error Figure credit: Masashi Sugiyama 15
16 FISHER DISCRIMINANT ANALYSIS 16
17 p p Gaussian Random Variable Principal axes are the eigenvector directions of 2 1 : eigenvalues 17
18 Covariance Matrix and Projection covariance = Ʃ variance = : eigenvalues with 18
19 Parameter Estimation Maximum Likelihood Estimation Data: Mean vector: Covariance matrix: 19
20 Fisher Discriminant Analysis - Consider Two Statistics Between-class variance 20
21 Fisher Discriminant Analysis - Consider Two Statistics Within-class variance : covariance matrix of each class 21
22 Total Variance Total covariance matrix: Total variance along : 22
23 Fisher Discriminant Analysis (FDA) Find having maximal for give. Between class variance Within class variance w 2 w 1 23
24 Take the Derivative Take the derivative Generalized eigenvector problem 24
25 Quiz 1: Closed Form Solution Problem: For two class problem with N 1 = N 2 : Assume is a full-rank matrix. Find the closed form solution: 25
26 Quiz 2: How to solve the generalized eigenvector problem? Problem: Are the solution eigenvalues real? (non-complex)? If not, how are these eigenvalues are treated? 26
27 Quiz 3: Two different FDAs Some papers consider the criterion instead of What is the difference? 27
28 In High Dimensional Space (1/2) : not a full rank matrix e-values Eigenvalue spectrum N - C N C
29 In High Dimensional Space (2/2) FDA will trivially pick up the null space of as solution. More seriously, the nullspace varies by sampling Ill-posed problem 29
30 Regularization in FDA Regularize e-values e-values N - C N - C The problem becomes well-posed. New solution: for two classes 30
31 Well-posed Problem vs. Ill-Posed Problem Well-posed Ill-posed The variation of configuration may come from the sampling variation. 31
32 Quiz 4: New SW with Infinite Data If infinite number of data are generated around each datum with isotropic Gaussian 32
33 GRAPHICAL MODELS 33
34 Naïve Bayes As An Extreme Case Naïve Bayes DY p(x) = p d (x d ) d=1 = p 1 (x 1 )p 2 (x 2 ) : : : p D (x D ) Simply ignore every correlation and dependencies between variables True decomposition: p(x) = p(x 1 )p(x 2 jx 1 )p(x 3 jx 1 ; x 2 ): : : p(x D jx 1 ; : : : ; x D 1 ) 34
35 Number of Parameters D = 1000 Number of parameters of a Gaussian: *(1000)/2 = 501,500 35
36 Independence p(x) = p 1 (x 1 )p 2 (x 2 ) D = D 1 + D 2 D 1 = 500, D 2 = 500 Number of parameters *(500)/ *(500)/2 = 251,500 Incorporating one independence can reduce the number of parameters into half. 36
37 Graphical Models We utilize probabilities that are represented by the graph structure. (directed & undirected) Use probabilistic independencies and conditional independencies that can be captured by graph structure 37
38 Directed Graphical Models Factorization of a (large) joint pdf X 1 X 2 X 3 X 4 ; X 5 X 6 P (X) = P (X 1 ; : : : ; X 7 ) = P (X 1 )P (X 2 )P (X 3 jx 1 ; X 2 ) P (X 4 ; X 5 jx 2 )P (X 6 ) P(X 7 jx 4 ; X 5 ; X 6 ) X 7 P(X 1 ; : : : ; X D ) = DY P(X i jpa Xi ) i=1 For given data, make a model for each decomposed probability, then estimate parameters separately. 38
39 D-Separations X 1 X 1 X 1 X 2 X 2 X 2 X 3 X 3 X 3 X 1? X 2 X 1? X 3 jx 2 X 2? X 3 jx 1 X 1?= X 2 jx 3 Causal path Common cause Common effect 39
40 Discrete Random Variables 40
41 Discrete Random Variables X 1 X 2 X 4 X 6 P (1) X 3 X 5 41
42 Inference with Discrete Variables 42
43 Inference with Discrete Variables Marginalize X 5 43
44 Inference with Discrete Variables Marginalize X 5! 44
45 Inference with Discrete Variables Marginalize X 3 45
46 Inference with Discrete Variables Marginalize X 3 46
47 Inference with Discrete Variables Marginalize X 2 47
48 Inference with Discrete Variables 48
49 Continuous Random Variables Each probability density is a Gaussian 49
50 Continuous Random Variables Jointly Gaussian with 6 variables Need 6 + 6*7/2 = 27 parameters Using graphical model: Need to obtain the parameters of We need to estimate only the following 19 parameters 50
51 Gaussian Random Variable Marginal 51
52 Gaussian Random Variable Conditional 52
53 KALMAN FILTER 53
54 Filtering Hidden Markov Models (HMM) /Linear Dynamical Systems (LDS) X 1 X 2 X 3 X k 1 X k Y 1 Y 2 Y 3 Y k 1 Y k HMM LDS 54
55 Kalman Filter Marginalizing x 0, x T-1 Filtering Conditional marginalization Marginalization from the left 55
56 Kalman Filter Unconstrained distribution Also, for joint density if necessary 56
57 Kalman Filter Unconstrained distribution 57
58 Kalman Filter Constrained distribution Marginalizing x 0 Same y and are from unconstrained distribution. What matters is y 0 58
59 Kalman Filter Marginalizing x 1, x T-1 What matters is how we can find the covariance of joint density function!! 59
60 Kalman Filter Filtering Conditional marginalization Marginalization from the left Why filtering? Once we know don t have to know (or keep)., we 60
61 Kalman Filter Time update Measurement update Time update Similar to the unconstrained distribution calculation! 61
62 Kalman Filter Also, Joint: 62
63 Kalman Filter Measurement update (Conditional density) Sum - ups 63
64 Kalman Filter With different notation, Alternative form of K t+1 64
65 Markov Random Field Undirected Graph If there is a direct edge between X i and X j : X 1 X 2 X 3 X i?= X j jx ni;j If there is no direct edge between X i and X j : X i? X j jx ni;j X 4 X 5 X 1?= X 3 jx 2 ; X 4 ; X 5 X 1? X 5 jx 3 65
66 Joint Distribution Product of functions on cliques X 1 X 2 X 3 P(X 1 ; X 2 ; X 3 ; X 4 ; X 5 ) = 1 Z Ã 1;2(X 1 ; X 2 ) Ã 1;3 (X 1 ; X 3 ) Ã 3;4;5 (X 3 ; X 4 ; X 5 ) = X Ã 1;2 (X 1 ; X 2 )Ã 1;3 (X 1 ; X 3 )Ã 3;4;5 (X 3 ; X 4 ; X 5 ) A 1 X 1 ;X 2 ;X 3 ;X 4 ;X 5 X4 X 5 Cliques: X 2 X 1 X 1 X 3 X4 X 3 X 5 The set of distributions satisfying MRF conditions (Markov random field) = The set of distributions decomposed by cliques (Gibbs random field) (Hammersley-Clifford Theorem) 66
67 More Fancy Models Latent Variable Model X Z Y Filtering X Z Y Z X Z 1 Z 2 Z 3 Z k 1 Z k Y 1 Y 2 Y 3 Y k 1 Y k 67
68 Topic Models 68
69 Topic Models 69
70 Independency Correlation and Independency Independency in Gaussian means no correlation Naïve Bayes? Mixture of Gaussian? 70
71 Conditional Independency vs. 71
72 Y Naïve Bayes for Classification X 1 X 2 X D P (X; Y ) = P (Y )P (X 1 jy ) : : : P (X D jy ) 72
73 Undirected Graphical Models Independent 73
74 Undirected Graphical Models Find all maximal cliques: 74
75 1 Z Z Undirected Graphical Models Potential functions on cliques ª 1 (X 1 ); ª 2 (X 2 ); : : : (X 1, X 2, : maximal cliques) P ( X ) = Z = Z = Partition function ª 1 ( X 1 ) ª 2 ( X 2 ) ª C ( X C ) X ª 1 ( X 1 ) ª C ( X C ) X 1 ; X 2 ; ; X D X 1 ; X 2 ; ; X D Discrete ª 1 ( X 1 ) ª C ( X C ) d X 1 X C Continuous 75
76 Undirected Graphical Models 2-cliques X 1 X 2 X 3 X 4 ª X 1 ; X 2 ; X 3 X 1 X 2 X X 3 X 4 X 3 ; X Z = X ª X 1 ; X 2 ; X 3 ( X 1 ; X 2 ; X 3 ) X 3 ; X 4 ( X 3 ; X 4 ) X 1 ; X 2 ; X 3 ; X 4 = ª X 1 ; X 2 ; X 3 ( 1 ; 1 ; 1 ) X 3 ; X 4 ( 1 ; 1 ) + ª X 1 ; X 2 ; X 3 ( 1 ; 1 ; 1 ) X 3 ; X 4 ( 1 ; 0 ) + : : : = : : : = = Ex. P ( 1 ; 0 ; 0 ; 0 ) = ª X 1 ; X 2 ; X 3 ( 1 ; 0 ; 0 ) X 3 ; X 4 ( 0 ; 0 ) = Z =
77 Estimating Parameters 2-cliques X 1 X 2 X 3 X 4 X 1 X 2 X 3 X 3 X = ª 1 ; 1 ; = ª 1 ; 1 ; ª X 1 ; X 2 ; X = 1 ; 1 = 1 ; 0 12 parameters X 3 ; X 4 Without graphical model: 15 parameters (2 4 1) : 8 : 4 X 1 X 2 X 3 X = P(1; 1; 1; 1) = P(1; 1; 1; 0) 77
78 = = = Conditional Independency X 1 P (X 1 ; X 2 ; X 3 ) = P (X 1 )P (X 2 jx 1 )P (X 3 jx 2 ) P (X 3 = 1jX 1 = 0; X 2 = 1) =? X 2 P (X 3 = 1jX 1 = 0; X 2 = 1) X 3 P ( X 1 = 0 ; X 2 = 1 ; X 3 = 1 ) P ( X 1 = 0 ; X 2 = 1 ; X 3 = 1 ) + P ( X 1 = 0 ; X 2 = 1 ; X 3 = 0 ) P ( X 1 = 0 ) P ( X 2 = 1 j X 1 = 0 ) P ( X 3 = 1 j X 2 = 1 ) P ( X 1 = 0 ) P ( X 2 = 1 j X 1 = 0 ) P ( X 3 = 1 j X 2 = 1 ) + P ( X 1 = 0 ) P ( X 2 = 1 j X 1 = 0 ) P ( X 3 = 0 j X 2 = 1 ) P ( X 3 = 1 j X 2 = 1 ) P ( X 3 = 1 j X 2 = 1 ) + P ( X 3 = 0 j X 2 = 1 ) = P (X 3 = 1jX 2 = 1) 78
79 Conditional Independency 2-cliques X 1 X 2 X 3 X 4 X 1 X 2 X = ª 1 ; = ª 1 ; = ª 0 ; = ª 0 ; 0 X 3 X = 1 = 0 X 1 X 2 X 3 X = ª 1 ; = ª 1 ; = ª 1 ; = ª 1 ; = ª 0 ; = ª 0 ; = ª 0 ; = ª 0 ; 0 0 P (X 4 = 1jX 1 = 0; X 2 = 1) =? P (X 4 = 1jX 1 = 0; X 2 = 0) =? P (X 4 = 1) =? All answers are the same:
80 Z Marginalization X 1 P (X 1 ; X 2 ; X 3 ) = P (X 1 )P (X 2 jx 1 )P (X 3 jx 2 ) P ( X 1 ; X 3 ) = P ( X 1 ; X 2 ; X 3 ) d X 2 X 2 Any good property like P (X 1 ; X 3 ) = P (X 1 )P(X 3 )? X 1 X 3 X 3 80
81 X i X D Z Marginalization X 1 X P(X 1 ; : : : ; X D ) = P (X 1 ) : : : P (X i 1 ) i 1 P (X i jx 1 ; : : : ; X i 1 ) P(X i+1 jx i ) : : : P (X D jx i ) P ( X 1 ; : : : ; X i 1 ; X i + 1 ; : : : ; X D ) = P ( X 1 ; : : : ; X D ) d X i Any decomposition with X i + 1 P (X 1 ; : : : ; X i 1 ; X i+1 ; : : : ; X D ) =? P (X 1 ; : : : ; X i 1 ; X i+1 ; : : : ; X D ) = P (X 1 ) : : : P (X i 1 )P (X i+1 jx 1 ; : : : ; X i 1 ) : : : P (X D jx 1 ; : : : ; X i 1 ) 81
82 Marginalization Marginalize X i P (X D jx 1 ; : : : ; X i 1 ) 82
83 Introducing Latent Variables Z X 1 X 2 X 3 X 4 X 1 X 2 X 3 X 4 Issue: how can a model be simplified as much as possible, while the flexibility is kept enough to incorporate the true dependency. 83
84 Expectation-Maximization Algorithm Parameter estimation with latent variables We don t have data for latent variables E-step: Data for latent variables are obtained from expectation with current parameter values. M-step: With expected latent variables, parameters are obtained by maximizing the likelihood. E-step and M-step are repeated back and forth until the likelihood converges. 84
85 X Z 0. 1 Expectation-Maximization Algorithm Gaussian mixture model Parameters: Unknown variables: We are given x i for i = 1 ; : : : ; N ¼ k ; ¹ k ; k for k = 1 ; : : : ; K z i = z for i 1 B A z i K i = 1 ; : : : ; N k = 1 i k = 2 j i 1 = 0 : 2 i 2 = 0 : 8 j 1 = 0 : 0 5 j 2 = 0 : 9 5 E-step: Distribution of z i (Responsibilities) ( z i k ) = P ¼ k N ( x i j ¹ k ; k ) K j = 1 ¼ j N ( x i j ¹ j ; j ) using current parameters ¼ k ; ¹ k ; k 85
86 N N k N l n à K Expectation-Maximization Algorithm M-step: Estimate parameters. ¹ k = 1 N k X N i = 1 ( z i k ) x i k = N 1 k X N i = 1 ( z i k ) ( x i ¹ k ) ( x i ¹ k ) > ¼ k = for N k = X N i = 1 ( z i k ) Iterate until l n p ( X j ¼ ; ¹ ; ) = X X ¼ k N ( x i j ¹ k ; k )! converges. i = 1 k = 1 86
87 Gaussian Mixture Model With EM 87 C. Bishop 2007, Figure 9.8(a)-(f)
88 ANY QUESTIONS? 88
89 THANK YOU Yung-Kyun Noh 89
Graphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationProbabilistic Graphical Models
Probabilistic Graphical Models David Sontag New York University Lecture 4, February 16, 2012 David Sontag (NYU) Graphical Models Lecture 4, February 16, 2012 1 / 27 Undirected graphical models Reminder
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationRandom Field Models for Applications in Computer Vision
Random Field Models for Applications in Computer Vision Nazre Batool Post-doctorate Fellow, Team AYIN, INRIA Sophia Antipolis Outline Graphical Models Generative vs. Discriminative Classifiers Markov Random
More informationMathematical Formulation of Our Example
Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot
More informationProbabilistic Time Series Classification
Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures
More informationECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4
ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationMachine Learning and Related Disciplines
Machine Learning and Related Disciplines The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 8-12 (Mon.-Fri.) Yung-Kyun Noh Machine Learning Interdisciplinary
More informationUndirected Graphical Models: Markov Random Fields
Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationCS 6375 Machine Learning
CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More information3 : Representation of Undirected GM
10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning
More informationFactor Analysis and Kalman Filtering (11/2/04)
CS281A/Stat241A: Statistical Learning Theory Factor Analysis and Kalman Filtering (11/2/04) Lecturer: Michael I. Jordan Scribes: Byung-Gon Chun and Sunghoon Kim 1 Factor Analysis Factor analysis is used
More informationSequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them
HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated
More informationOverview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated
Fall 3 Computer Vision Overview of Statistical Tools Statistical Inference Haibin Ling Observation inference Decision Prior knowledge http://www.dabi.temple.edu/~hbling/teaching/3f_5543/index.html Bayesian
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More information6.867 Machine learning, lecture 23 (Jaakkola)
Lecture topics: Markov Random Fields Probabilistic inference Markov Random Fields We will briefly go over undirected graphical models or Markov Random Fields (MRFs) as they will be needed in the context
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationA brief introduction to Conditional Random Fields
A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationProbabilistic Graphical Models
2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector
More informationCOMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma
COMS 4771 Probabilistic Reasoning via Graphical Models Nakul Verma Last time Dimensionality Reduction Linear vs non-linear Dimensionality Reduction Principal Component Analysis (PCA) Non-linear methods
More informationLogistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu
Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationMachine Learning Techniques for Computer Vision
Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM
More informationConditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013
Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative Chain CRF General
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationp L yi z n m x N n xi
y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationHeeyoul (Henry) Choi. Dept. of Computer Science Texas A&M University
Heeyoul (Henry) Choi Dept. of Computer Science Texas A&M University hchoi@cs.tamu.edu Introduction Speaker Adaptation Eigenvoice Comparison with others MAP, MLLR, EMAP, RMP, CAT, RSW Experiments Future
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationDeep Learning Srihari. Deep Belief Nets. Sargur N. Srihari
Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More information1 Undirected Graphical Models. 2 Markov Random Fields (MRFs)
Machine Learning (ML, F16) Lecture#07 (Thursday Nov. 3rd) Lecturer: Byron Boots Undirected Graphical Models 1 Undirected Graphical Models In the previous lecture, we discussed directed graphical models.
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationIntroduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones
Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationHidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010
Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data
More informationABSTRACT INTRODUCTION
ABSTRACT Presented in this paper is an approach to fault diagnosis based on a unifying review of linear Gaussian models. The unifying review draws together different algorithms such as PCA, factor analysis,
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationCSC411: Final Review. James Lucas & David Madras. December 3, 2018
CSC411: Final Review James Lucas & David Madras December 3, 2018 Agenda 1. A brief overview 2. Some sample questions Basic ML Terminology The final exam will be on the entire course; however, it will be
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationBayesian Networks Inference with Probabilistic Graphical Models
4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning
More informationIntroduction to Probabilistic Graphical Models: Exercises
Introduction to Probabilistic Graphical Models: Exercises Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Exercise 1: basics
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationMACHINE LEARNING ADVANCED MACHINE LEARNING
MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 2 2 MACHINE LEARNING Overview Definition pdf Definition joint, condition, marginal,
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationProbability models for machine learning. Advanced topics ML4bio 2016 Alan Moses
Probability models for machine learning Advanced topics ML4bio 2016 Alan Moses What did we cover in this course so far? 4 major areas of machine learning: Clustering Dimensionality reduction Classification
More informationLecture 3: Pattern Classification
EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures
More informationCS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 Plan for today and next week Today and next time: Bayesian networks (Bishop Sec. 8.1) Conditional
More informationUnsupervised Learning
2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and
More informationPrincipal Component Analysis and Linear Discriminant Analysis
Principal Component Analysis and Linear Discriminant Analysis Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/29
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationMachine Learning 2nd Edition
INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010
More informationHidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing
Hidden Markov Models By Parisa Abedi Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.) data Time-series data E.g. Speech
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More information10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging
10-708: Probabilistic Graphical Models 10-708, Spring 2018 10 : HMM and CRF Lecturer: Kayhan Batmanghelich Scribes: Ben Lengerich, Michael Kleyman 1 Case Study: Supervised Part-of-Speech Tagging We will
More informationData Mining Techniques
Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!
More informationLecture 7: Con3nuous Latent Variable Models
CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/
More informationLecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models
Advanced Machine Learning Lecture 10 Mixture Models II 30.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ Announcement Exercise sheet 2 online Sampling Rejection Sampling Importance
More informationActive and Semi-supervised Kernel Classification
Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),
More informationNotes on Machine Learning for and
Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationLearning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014
Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of
More informationMachine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 20, 2012 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models CPSC
More informationExploiting k-nearest Neighbor Information with Many Data
Exploiting k-nearest Neighbor Information with Many Data 2017 TEST TECHNOLOGY WORKSHOP 2017. 10. 24 (Tue.) Yung-Kyun Noh Robotics Lab., Contents Nonparametric methods for estimating density functions Nearest
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationMachine Learning, Fall 2009: Midterm
10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all
More informationLecture 13: Data Modelling and Distributions. Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1
Lecture 13: Data Modelling and Distributions Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1 Why data distributions? It is a well established fact that many naturally occurring
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING Text Data: Topic Model Instructor: Yizhou Sun yzsun@cs.ucla.edu December 4, 2017 Methods to be Learnt Vector Data Set Data Sequence Data Text Data Classification Clustering
More informationHidden Markov Models
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 22 April 2, 2018 1 Reminders Homework
More information2 : Directed GMs: Bayesian Networks
10-708: Probabilistic Graphical Models 10-708, Spring 2017 2 : Directed GMs: Bayesian Networks Lecturer: Eric P. Xing Scribes: Jayanth Koushik, Hiroaki Hayashi, Christian Perez Topic: Directed GMs 1 Types
More informationLearning the Linear Dynamical System with ASOS ( Approximated Second-Order Statistics )
Learning the Linear Dynamical System with ASOS ( Approximated Second-Order Statistics ) James Martens University of Toronto June 24, 2010 Computer Science UNIVERSITY OF TORONTO James Martens (U of T) Learning
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1
More informationDiversity-Promoting Bayesian Learning of Latent Variable Models
Diversity-Promoting Bayesian Learning of Latent Variable Models Pengtao Xie 1, Jun Zhu 1,2 and Eric Xing 1 1 Machine Learning Department, Carnegie Mellon University 2 Department of Computer Science and
More informationPATTERN CLASSIFICATION
PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More information