Online Learning of Probabilistic Graphical Models
|
|
- Valentine Bell
- 5 years ago
- Views:
Transcription
1 1/34 Online Learning of Probabilistic Graphical Models Frédéric Koriche CRIL - CNRS UMR 8188, Univ. Artois koriche@cril.fr CRIL-U Nankin 2016
2 Probabilistic Graphical Models 2/34 Outline 1 Probabilistic Graphical Models 2 Online Learning 3 Online Learning of Markov Forests 4 Beyond Markov Forests
3 Probabilistic Graphical Models 3/34 Graphical Models Bioinformatics Financial Modeling Image Processing Social Networks Analysis Graphical Models At the intersection of Probability Theory and Graph Theory, graphical models are a well-studied representation framework with various applications.
4 Probabilistic Graphical Models 4/ X 12 X 9 X 8 X 3 X 11 X 7 X 2 X 10 X 6 X 1 X 4 X Graphical Models Encoding high-dimensional distributions in a compact and intuitive way: Qualitative uncertainty (interdependencies) is captured by the structure Quantitative uncertainty (probabilities) is captured by the parameters
5 Probabilistic Graphical Models 5/34 Graphical Models Bayesian Networks Sparse Bayesian Networks Bayesian Polytrees Bayesian Forests Markov Networks Sparse Markov Networks Bounded Tree-width MNs Markov Forests Classes of Graphical Models For an outcome space X R n, a class of graphical models is a pair M = G Θ, where G is space of n-dimensional graphs, and Θ is a space of d-dimensional vectors. G captures structural constraints (directed vs. undirected, sparse vs. dense, etc.) Θ captures parametric constraints (binomial, multinomial, Gaussian, etc.)
6 Probabilistic Graphical Models 6/34 P(X 2,X 3 ) X 9 X 8 X 7 X 2 X 6 X 15 X 5 X 1 P(X 1 ) X 3 X 13 X 12 X 11 X 10 X 14 X 4 (Multinomial) Markov Forests Using [m] = {0,,m 1}, the class of Markov Forests over [m] n is given by F n Θ m,n, where F n is the space of all acyclic graphs of order n; Θ m,n is the space of all parameter vectors mapping a probability table θ i [0,1] m to each candidate node i, and a probability table θ ij [0,1] m m to each candidate edge (i,j).
7 Probabilistic Graphical Models 7/34 P(X 3 X 2 ) X 9 X 8 X 7 X 2 X 6 X 15 X 5 X 1 P(X 1 ) X 3 X 13 X 12 X 11 X 10 X 14 X 4 (Multinomial) Bayesian Forests The class of Bayesian Forests over [m] n is given by F n Θ m,n, where F n is the space of all directed forests of order n; Θ m,n is the space of all parameter vectors mapping a probability table θ i [0,1] m to each candidate node i, and a conditional probability table θ j i [m] [0,1] m to each candidate arc (i,j).
8 Probabilistic Graphical Models 8/34 Earthquake P(A,E) Burglary Radio Alarm Call Example: The Alarm Model Markov tree representation Bayesian tree representation
9 Probabilistic Graphical Models 8/34 Earthquake P(A E) Burglary Radio Alarm Call Example: The Alarm Model Markov tree representation Bayesian tree representation
10 Online Learning 9/34 Outline 1 Probabilistic Graphical Models 2 Online Learning 3 Online Learning of Markov Forests 4 Beyond Markov Forests
11 Online Learning 10/34 E B R A C B A C E R Learning Given a class of graphical models M = G Θ, the learning problem is to extract from a sequence of outcomes x 1:T = (x 1,,x T ), the structure G G, and the parameters θ Θ of a generative model M = (G, θ) capable of predicting future, unseen, outcomes. A natural loss function for measuring the performance of M is the log-loss: l(m,x) = lnp M (x)
12 Online Learning 10/34 E B R A C B A C E R Learning Given a class of graphical models M = G Θ, the learning problem is to extract from a sequence of outcomes x 1:T = (x 1,,x T ), the structure G G, and the parameters θ Θ of a generative model M = (G, θ) capable of predicting future, unseen, outcomes. A natural loss function for measuring the performance of M is the log-loss: l(m,x) = lnp M (x)
13 Online Learning 10/34 E B R A C B A C E R Learning Given a class of graphical models M = G Θ, the learning problem is to extract from a sequence of outcomes x 1:T = (x 1,,x T ), the structure G G, and the parameters θ Θ of a generative model M = (G, θ) capable of predicting future, unseen, outcomes. A natural loss function for measuring the performance of M is the log-loss: l(m,x) = lnp M (x)
14 Online Learning 11/ M = (G,θ) Graphical Model Training Set Batch Learning A two-stage process: A model M M is first extracted from a training set. The average loss of the model M is evalued using a test set.
15 Online Learning 11/34 M = (G,θ) Graphical Model 1 T Tt=1 l(m,x t ) Test Set of size T Batch Learning A two-stage process: A model M M is first extracted from a training set. The average loss of the model M is evalued using a test set.
16 Online Learning 12/34 Environment Learner Online Learning A sequential process, or repeated game between the learner and its environment. During each trial t = 1,,T, the learner chooses a model M t M ; the environment responds by an outcome x t X, and the learner incurs the loss l(m t,x t ).
17 Online Learning 12/34 Environment M t = ( G t,θ t) Learner Online Learning A sequential process, or repeated game between the learner and its environment. During each trial t = 1,,T, the learner chooses a model M t M ; the environment responds by an outcome x t X, and the learner incurs the loss l(m t,x t ).
18 Online Learning 12/34 l ( M t,θ t) Environment M t = ( G t,θ t) Learner Online Learning A sequential process, or repeated game between the learner and its environment. During each trial t = 1,,T, the learner chooses a model M t M ; the environment responds by an outcome x t X, and the learner incurs the loss l(m t,x t ).
19 Online Learning 12/34 Environment M t+1 = ( G t+1,θ t+1) Learner Online Learning A sequential process, or repeated game between the learner and its environment. During each trial t = 1,,T, the learner chooses a model M t M ; the environment responds by an outcome x t X, and the learner incurs the loss l(m t,x t ).
20 Online Learning 12/34 l ( M t+1,θ t+1) Environment M t+1 = ( G t+1,θ t+1) Learner Online Learning A sequential process, or repeated game between the learner and its environment. During each trial t = 1,,T, the learner chooses a model M t M ; the environment responds by an outcome x t X, and the learner incurs the loss l(m t,x t ).
21 Online Learning 13/34 Online vs. Batch In batch (PAC) learning, it is assumed that the data (training set + test set) is generated by a fixed (but unknown) probability distribution. In online learning, there is no statistical assumption about the data generated by the environment. Online learning is particularly suited to: * Adaptive environments, where the target distribution can change over time; * Streaming applications, where all the data is not available in advance; * Large-scale datasets, by processing only one outcome at a time.
22 Online Learning 13/34 Online vs. Batch In batch (PAC) learning, it is assumed that the data (training set + test set) is generated by a fixed (but unknown) probability distribution. In online learning, there is no statistical assumption about the data generated by the environment. Online learning is particularly suited to: * Adaptive environments, where the target distribution can change over time; * Streaming applications, where all the data is not available in advance; * Large-scale datasets, by processing only one outcome at a time.
23 Online Learning 14/34 Regret Let M be a class of graphical models over an outcome space X, and A be a learning algorithm for M. The minimax regret of A at horizon T, is the maximum, over every sequence of outcomes x 1:T = (x 1,,x T ), of the cumulative relative loss between A and the best model in M, i.e. [ ] T R(A,T) = max l(m t,x t T ) min l(m,x t ) x 1:T X T t=1 M M t=1 Learnability A class M is (online) learnable if it admits an online learning algorithm A such that: 1 the minimax regret of A is sublinear in T, i.e. lim R(A,T) = 0 T 2 the per-round computational complexity of A is polynomial in the dimension of M.
24 Online Learning 14/34 Regret Let M be a class of graphical models over an outcome space X, and A be a learning algorithm for M. The minimax regret of A at horizon T, is the maximum, over every sequence of outcomes x 1:T = (x 1,,x T ), of the cumulative relative loss between A and the best model in M, i.e. [ ] T R(A,T) = max l(m t,x t T ) min l(m,x t ) x 1:T X T t=1 M M t=1 Learnability A class M is (online) learnable if it admits an online learning algorithm A such that: 1 the minimax regret of A is sublinear in T, i.e. lim R(A,T) = 0 T 2 the per-round computational complexity of A is polynomial in the dimension of M.
25 Online Learning of Markov Forests 15/34 Outline 1 Probabilistic Graphical Models 2 Online Learning 3 Online Learning of Markov Forests Regret Decomposition Parametric Regret Structural Regret 4 Beyond Markov Forests
26 Online Learning of Markov Forests 16/34 X 3 X X 9 X 8 X 7 X 2 X 6 X 15 X 5 X X 3 X 13 X 12 X 11 X 10 X 14 X 4 Does there exist an efficient online learning algorithm for Markov forests? How can we efficiently update at each iteration both the structure F t and the parameters θ t in order to minimize the cumulative loss t l(f t,θ t,x t )?
27 Online Learning of Markov Forests 16/34 X 3 X X 9 X 8 X 7 X 2 X 6 X 15 X 5 X X 3 X 13 X 12 X 11 X 10 X 14 X 4 Does there exist an efficient online learning algorithm for Markov forests? How can we efficiently update at each iteration both the structure F t and the parameters θ t in order to minimize the cumulative loss t l(f t,θ t,x t )?
28 Online Learning of Markov Forests 17/34 Two key properties For the class F m,n = F n Θ m,n of Markov forests, The probability distribution associated with a Markov forest M = (F, θ) can be factorized into a closed-form: n θ ij (x i,x j ) P M (x) = θ i (x i ) i=1 (i,j) F θ i (x i )θ j (x j ) The space F n of forest structures is a matroid; minimizing a linear function over F n can be done in quadratic time using the matroid greedy algorithm.
29 Online Learning of Markov Forests 17/34 Two key properties For the class F m,n = F n Θ m,n of Markov forests, The probability distribution associated with a Markov forest M = (F, θ) can be factorized into a closed-form: n θ ij (x i,x j ) P M (x) = θ i (x i ) i=1 (i,j) F θ i (x i )θ j (x j ) The space F n of forest structures is a matroid; minimizing a linear function over F n can be done in quadratic time using the matroid greedy algorithm.
30 Online Learning of Markov Forests 18/34 Log-Loss Let M = (f,θ) be a Markov forest, where f is the characteristic vector of the structure. l(m,x) = lnp M (x) = ln θ i (x i ) i=1 (i,j) ( θij (x i,x j ) θ i (x i )θ j (x j ) ) fij So, the log-loss is an affine function of the forest structure: l(m,x) = ψ(x) + f,φ(x) where ψ(x) = i [n] ln 1 θ i (x i ) and φ ij (x i,x j ) = ln ( ) θi (x i )θ j (x j ) θ ij (x i,x j )
31 Online Learning of Markov Forests 18/34 Log-Loss Let M = (f,θ) be a Markov forest, where f is the characteristic vector of the structure. l(m,x) = lnp M (x) = ln θ i (x i ) i=1 (i,j) ( θij (x i,x j ) θ i (x i )θ j (x j ) ) fij So, the log-loss is an affine function of the forest structure: l(m,x) = ψ(x) + f,φ(x) where ψ(x) = i [n] ln 1 θ i (x i ) and φ ij (x i,x j ) = ln ( ) θi (x i )θ j (x j ) θ ij (x i,x j )
32 Online Learning of Markov Forests Regret Decomposition 19/34 Regret Decomposition Based on the linearity of the log-loss, the regret is decomposable into two parts: ( ) ( ) R M 1:T,x 1:T = R f 1:T,x 1:T + R (θ ) 1:T,x 1:T where ( ) R f 1:T,x 1:T T = l(f t,θ t,x t ) l(f,θ t,x t ) t=1 ( ) R θ 1:T,x 1:T T = l(f,θ t,x t ) l(f,θ,x t ) t=1 (Structural Regret) (Parametric Regret)
33 Online Learning of Markov Forests Regret Decomposition 19/34 Regret Decomposition Based on the linearity of the log-loss, the regret is decomposable into two parts: ( ) ( ) R M 1:T,x 1:T = R f 1:T,x 1:T + R (θ ) 1:T,x 1:T where ( ) R f 1:T,x 1:T T = l(f t,θ t,x t ) l(f,θ t,x t ) t=1 ( ) R θ 1:T,x 1:T T = l(f,θ t,x t ) l(f,θ,x t ) t=1 (Structural Regret) (Parametric Regret)
34 Online Learning of Markov Forests Parametric Regret 20/34 Parametric Regret Based on the closed-form expression of Markov forests, the parametric regret is decomposable into local regrets: ( ) R θ 1:T,x 1:T = n ln θ i (x1:t ) i θ i 1:T (x i 1:T ) i=1 + ln (i,j) F + (i,j) F θ ij (x1:t ) ij θ ij 1:T (x ij 1:T ) ln θ1:t i (x i 1:T ) ) θ i (x1:t i θ 1:T j (x j 1:T ) θ j (x1:t ) j (Univariate estimators) (Bivariate estimators) (Bivariate compensation) where θi T (x1:t i ) = θi (xt i ), t=1 θij T (x1:t ij ) = θij (xt i,xt j ), t=1 θ1:t i (xi 1:T T ) = θi t (xt i ) t=1 θ1:t ij (xij 1:T T ) = θij t (xt i,xt j ) t=1
35 Online Learning of Markov Forests Parametric Regret 20/34 Parametric Regret Based on the closed-form expression of Markov forests, the parametric regret is decomposable into local regrets: ( ) R θ 1:T,x 1:T = n ln θ i (x1:t ) i θ i 1:T (x i 1:T ) i=1 + ln (i,j) F + (i,j) F θ ij (x1:t ) ij θ ij 1:T (x ij 1:T ) ln θ1:t i (x i 1:T ) ) θ i (x1:t i θ 1:T j (x j 1:T ) θ j (x1:t ) j (Univariate estimators) (Bivariate estimators) (Bivariate compensation) where θi T (x1:t i ) = θi (xt i ), t=1 θij T (x1:t ij ) = θij (xt i,xt j ), t=1 θ1:t i (xi 1:T T ) = θi t (xt i ) t=1 θ1:t ij (xij 1:T T ) = θij t (xt i,xt j ) t=1
36 Online Learning of Markov Forests Parametric Regret 21/34 Local regrets Expressions of the form θ (x 1:T ) θ 1:T (x 1:T ) Use Dirichlet Mixtures have been extensively studied in literature of universal coding (Grünwald, 2007). Dirichlet Mixtures Using symmetric Dirichlet mixtures for the parametric estimators, θ 1:T (x 1:T ) = T t=1 If µ = 2 1, we get the Jeffreys mixture. P λ (x t )p µ (λ)dλ = Γ(mµ) mv=1 Γ(t v + µ) Γ(µ) m Γ(t + mµ)
37 Online Learning of Markov Forests Parametric Regret 21/34 Local regrets Expressions of the form θ (x 1:T ) θ 1:T (x 1:T ) Use Dirichlet Mixtures have been extensively studied in literature of universal coding (Grünwald, 2007). Dirichlet Mixtures Using symmetric Dirichlet mixtures for the parametric estimators, θ 1:T (x 1:T ) = T t=1 If µ = 2 1, we get the Jeffreys mixture. P λ (x t )p µ (λ)dλ = Γ(mµ) mv=1 Γ(t v + µ) Γ(µ) m Γ(t + mµ)
38 Online Learning of Markov Forests Parametric Regret 21/34 Local regrets Expressions of the form θ (x 1:T ) θ 1:T (x 1:T ) Use Dirichlet Mixtures have been extensively studied in literature of universal coding (Grünwald, 2007). Dirichlet Mixtures Using symmetric Dirichlet mixtures for the parametric estimators, θ 1:T (x 1:T ) = T t=1 If µ = 2 1, we get the Jeffreys mixture. P λ (x t )p µ (λ)dλ = Γ(mµ) mv=1 Γ(t v + µ) Γ(µ) m Γ(t + mµ)
39 Online Learning of Markov Forests Parametric Regret 22/34 Jeffreys Strategy (An extension of Xie and Barron (2000) to forest parameters) For each trial t, 1 Set θ i t+1 (u) = t u t + m 2 2 Set θ ij t+1 (u,v) = t uv t + m2 2 for all i [n],u [m] for all (i,j) ( [n]) 2,u,v [m] Performance n(m 1) + (n 1)(m 1)2 Minimax regret: ln T 2 2π + C m,n + o(m 2 n) Per-round time complexity: O(m 2 n 2 )
40 Online Learning of Markov Forests Parametric Regret 22/34 Jeffreys Strategy (An extension of Xie and Barron (2000) to forest parameters) For each trial t, 1 Set θ i t+1 (u) = t u t + m 2 2 Set θ ij t+1 (u,v) = t uv t + m2 2 for all i [n],u [m] for all (i,j) ( [n]) 2,u,v [m] Performance n(m 1) + (n 1)(m 1)2 Minimax regret: ln T 2 2π + C m,n + o(m 2 n) Per-round time complexity: O(m 2 n 2 )
41 Online Learning of Markov Forests Structural Regret 23/34 Structural Regret Based on the linear form of the log-loss, the structural learning problem can be cast as a sequential combinatorial optimization problem (Audibert et al., 2011). Minimize [ ] T t f s,φ(x s ) t=1 s=1 subject to f 1,,f T F n
42 Online Learning of Markov Forests Structural Regret 23/34 Structural Regret Based on the linear form of the log-loss, the structural learning problem can be cast as a sequential combinatorial optimization problem (Audibert et al., 2011). Minimize subject to [ ] T t f s,φ(x s ) t=1 s=1 Use the greedy algorithm f 1,,f T F n
43 Online Learning of Markov Forests Structural Regret 24/34 Follow the Perturbed Leader (Kalai and Vempala, 2005) For each trial t, 1 Draw r t in [0, 1 α t ] uniformly at random 2 Set f t+1 = argmin f Fn f,l t + r t where L t = t s=1 φ(x s ) Performance Minimax regret: n 2 ln(t/2 + m 2 /4) 2T Per-round time complexity: O(n 2 logn)
44 Online Learning of Markov Forests Structural Regret 24/34 Follow the Perturbed Leader (Kalai and Vempala, 2005) For each trial t, 1 Draw r t in [0, 1 α t ] uniformly at random 2 Set f t+1 = argmin f Fn f,l t + r t where L t = t s=1 φ(x s ) Performance Minimax regret: n 2 ln(t/2 + m 2 /4) 2T Per-round time complexity: O(n 2 logn)
45 Online Learning of Markov Forests Structural Regret 25/34 Average log-loss CL CLT OFDEF OFDET KDD Cup 2000: Iterations Experiments The average log-loss is measured on the test set at each iteration. The online learner (FPL + Jeffreys) rapidly converges to the batch learner (Chow-Liu for Markov trees, and Thresholded Chow-Liu for Markov forests).
46 Online Learning of Markov Forests Structural Regret 25/34 Average log-loss CL CLT OFDEF OFDET MSNBC: Iterations Experiments The average log-loss is measured on the test set at each iteration. The online learner (FPL + Jeffreys) rapidly converges to the batch learner (Chow-Liu for Markov trees, and Thresholded Chow-Liu for Markov forests).
Online Forest Density Estimation
Online Forest Density Estimation Frédéric Koriche CRIL - CNRS UMR 8188, Univ. Artois koriche@cril.fr UAI 16 1 Outline 1 Probabilistic Graphical Models 2 Online Density Estimation 3 Online Forest Density
More informationLearning with Large Number of Experts: Component Hedge Algorithm
Learning with Large Number of Experts: Component Hedge Algorithm Giulia DeSalvo and Vitaly Kuznetsov Courant Institute March 24th, 215 1 / 3 Learning with Large Number of Experts Regret of RWM is O( T
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationLearning in Bayesian Networks
Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks
More informationStructure Learning: the good, the bad, the ugly
Readings: K&F: 15.1, 15.2, 15.3, 15.4, 15.5 Structure Learning: the good, the bad, the ugly Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 29 th, 2006 1 Understanding the uniform
More informationMachine learning: lecture 20. Tommi S. Jaakkola MIT CSAIL
Machine learning: lecture 20 ommi. Jaakkola MI AI tommi@csail.mit.edu Bayesian networks examples, specification graphs and independence associated distribution Outline ommi Jaakkola, MI AI 2 Bayesian networks
More informationProbabilistic Graphical Models (I)
Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random
More informationReadings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008
Readings: K&F: 16.3, 16.4, 17.3 Bayesian Param. Learning Bayesian Structure Learning Graphical Models 10708 Carlos Guestrin Carnegie Mellon University October 6 th, 2008 10-708 Carlos Guestrin 2006-2008
More informationCS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine
CS 484 Data Mining Classification 7 Some slides are from Professor Padhraic Smyth at UC Irvine Bayesian Belief networks Conditional independence assumption of Naïve Bayes classifier is too strong. Allows
More informationBayesian Network Structure Learning using Factorized NML Universal Models
Bayesian Network Structure Learning using Factorized NML Universal Models Teemu Roos, Tomi Silander, Petri Kontkanen, and Petri Myllymäki Complex Systems Computation Group, Helsinki Institute for Information
More informationScalable robust hypothesis tests using graphical models
Scalable robust hypothesis tests using graphical models Umamahesh Srinivas ipal Group Meeting October 22, 2010 Binary hypothesis testing problem Random vector x = (x 1,...,x n ) R n generated from either
More informationAdvanced Machine Learning
Advanced Machine Learning Follow-he-Perturbed Leader MEHRYAR MOHRI MOHRI@ COURAN INSIUE & GOOGLE RESEARCH. General Ideas Linear loss: decomposition as a sum along substructures. sum of edge losses in a
More informationSum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017
Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationThe Monte Carlo Method: Bayesian Networks
The Method: Bayesian Networks Dieter W. Heermann Methods 2009 Dieter W. Heermann ( Methods)The Method: Bayesian Networks 2009 1 / 18 Outline 1 Bayesian Networks 2 Gene Expression Data 3 Bayesian Networks
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationOnline Bayesian Passive-Aggressive Learning
Online Bayesian Passive-Aggressive Learning Full Journal Version: http://qr.net/b1rd Tianlin Shi Jun Zhu ICML 2014 T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 1 / 35 Outline Introduction Motivation Framework
More information4: Parameter Estimation in Fully Observed BNs
10-708: Probabilistic Graphical Models 10-708, Spring 2015 4: Parameter Estimation in Fully Observed BNs Lecturer: Eric P. Xing Scribes: Purvasha Charavarti, Natalie Klein, Dipan Pal 1 Learning Graphical
More informationLearning Bayesian network : Given structure and completely observed data
Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationStochastic Proximal Gradient Algorithm
Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind
More informationDirected Graphical Models
CS 2750: Machine Learning Directed Graphical Models Prof. Adriana Kovashka University of Pittsburgh March 28, 2017 Graphical Models If no assumption of independence is made, must estimate an exponential
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 4 Learning Bayesian Networks CS/CNS/EE 155 Andreas Krause Announcements Another TA: Hongchao Zhou Please fill out the questionnaire about recitations Homework 1 out.
More informationMachine learning: lecture 20. Tommi S. Jaakkola MIT CSAIL
Machine learning: lecture 20 ommi. Jaakkola MI CAI tommi@csail.mit.edu opics Representation and graphical models examples Bayesian networks examples, specification graphs and independence associated distribution
More informationLecture 5: Bayesian Network
Lecture 5: Bayesian Network Topics of this lecture What is a Bayesian network? A simple example Formal definition of BN A slightly difficult example Learning of BN An example of learning Important topics
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15
More informationCSE 473: Artificial Intelligence Autumn Topics
CSE 473: Artificial Intelligence Autumn 2014 Bayesian Networks Learning II Dan Weld Slides adapted from Jack Breese, Dan Klein, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 473 Topics
More informationLearning Bayes Net Structures
Learning Bayes Net Structures KF, Chapter 15 15.5 (RN, Chapter 20) Some material taken from C Guesterin (CMU), K Murphy (UBC) 1 2 Learning Bayes Nets Known Structure Unknown Data Complete Missing Easy
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 5 Bayesian Learning of Bayesian Networks CS/CNS/EE 155 Andreas Krause Announcements Recitations: Every Tuesday 4-5:30 in 243 Annenberg Homework 1 out. Due in class
More informationOnline Learning and Sequential Decision Making
Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationLecture 6: Graphical Models: Learning
Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationOnline Learning, Mistake Bounds, Perceptron Algorithm
Online Learning, Mistake Bounds, Perceptron Algorithm 1 Online Learning So far the focus of the course has been on batch learning, where algorithms are presented with a sample of training data, from which
More informationSum-Product Networks: A New Deep Architecture
Sum-Product Networks: A New Deep Architecture Pedro Domingos Dept. Computer Science & Eng. University of Washington Joint work with Hoifung Poon 1 Graphical Models: Challenges Bayesian Network Markov Network
More informationLecture 8: Bayesian Networks
Lecture 8: Bayesian Networks Bayesian Networks Inference in Bayesian Networks COMP-652 and ECSE 608, Lecture 8 - January 31, 2017 1 Bayes nets P(E) E=1 E=0 0.005 0.995 E B P(B) B=1 B=0 0.01 0.99 E=0 E=1
More information15-780: Graduate Artificial Intelligence. Bayesian networks: Construction and inference
15-780: Graduate Artificial Intelligence ayesian networks: Construction and inference ayesian networks: Notations ayesian networks are directed acyclic graphs. Conditional probability tables (CPTs) P(Lo)
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationLearning, Games, and Networks
Learning, Games, and Networks Abhishek Sinha Laboratory for Information and Decision Systems MIT ML Talk Series @CNRG December 12, 2016 1 / 44 Outline 1 Prediction With Experts Advice 2 Application to
More informationDynamic Approaches: The Hidden Markov Model
Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message
More informationIntroduction to Probabilistic Graphical Models
Introduction to Probabilistic Graphical Models Sargur Srihari srihari@cedar.buffalo.edu 1 Topics 1. What are probabilistic graphical models (PGMs) 2. Use of PGMs Engineering and AI 3. Directionality in
More informationBN Semantics 3 Now it s personal! Parameter Learning 1
Readings: K&F: 3.4, 14.1, 14.2 BN Semantics 3 Now it s personal! Parameter Learning 1 Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 22 nd, 2006 1 Building BNs from independence
More informationIntroduction to Bayesian Learning. Machine Learning Fall 2018
Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationMachine Learning. Lecture 3: Logistic Regression. Feng Li.
Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September
More informationAdvanced Machine Learning & Perception
Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 6 Standard Kernels Unusual Input Spaces for Kernels String Kernels Probabilistic Kernels Fisher Kernels Probability Product Kernels
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationLecture 8: December 17, 2003
Computational Genomics Fall Semester, 2003 Lecture 8: December 17, 2003 Lecturer: Irit Gat-Viks Scribe: Tal Peled and David Burstein 8.1 Exploiting Independence Property 8.1.1 Introduction Our goal is
More informationLearning Bayesian networks
1 Lecture topics: Learning Bayesian networks from data maximum likelihood, BIC Bayesian, marginal likelihood Learning Bayesian networks There are two problems we have to solve in order to estimate Bayesian
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Multivariate Gaussians Mark Schmidt University of British Columbia Winter 2019 Last Time: Multivariate Gaussian http://personal.kenyon.edu/hartlaub/mellonproject/bivariate2.html
More informationTutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.
Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research
More informationECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning
ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Summary of Class Advanced Topics Dhruv Batra Virginia Tech HW1 Grades Mean: 28.5/38 ~= 74.9%
More informationExact model averaging with naive Bayesian classifiers
Exact model averaging with naive Bayesian classifiers Denver Dash ddash@sispittedu Decision Systems Laboratory, Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA 15213 USA Gregory F
More information11 : Gaussian Graphic Models and Ising Models
10-708: Probabilistic Graphical Models 10-708, Spring 2017 11 : Gaussian Graphic Models and Ising Models Lecturer: Bryon Aragam Scribes: Chao-Ming Yen 1 Introduction Different from previous maximum likelihood
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationAchievability of Asymptotic Minimax Regret in Online and Batch Prediction
JMLR: Workshop and Conference Proceedings 9:181 196, 013 ACML 013 Achievability of Asymptotic Minimax Regret in Online and Batch Prediction Kazuho Watanabe Graduate School of Information Science Nara Institute
More informationLearning' Probabilis2c' Graphical' Models' BN'Structure' Structure' Learning' Daphne Koller
Probabilis2c' Graphical' Models' Learning' BN'Structure' Structure' Learning' Why Structure Learning To learn model for new queries, when domain expertise is not perfect For structure discovery, when inferring
More informationIntroduction to Probabilistic Graphical Models: Exercises
Introduction to Probabilistic Graphical Models: Exercises Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Exercise 1: basics
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationCS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1
More informationChapter 16. Structured Probabilistic Models for Deep Learning
Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe
More informationOnline Kernel PCA with Entropic Matrix Updates
Online Kernel PCA with Entropic Matrix Updates Dima Kuzmin Manfred K. Warmuth University of California - Santa Cruz ICML 2007, Corvallis, Oregon April 23, 2008 D. Kuzmin, M. Warmuth (UCSC) Online Kernel
More information27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling
10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel
More informationExpectation Propagation Algorithm
Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,
More information{ p if x = 1 1 p if x = 0
Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =
More informationMachine Learning Summer School
Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationLearning P-maps Param. Learning
Readings: K&F: 3.3, 3.4, 16.1, 16.2, 16.3, 16.4 Learning P-maps Param. Learning Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 24 th, 2008 10-708 Carlos Guestrin 2006-2008
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationPROBABILISTIC REASONING SYSTEMS
PROBABILISTIC REASONING SYSTEMS In which we explain how to build reasoning systems that use network models to reason with uncertainty according to the laws of probability theory. Outline Knowledge in uncertain
More informationNotes on MapReduce Algorithms
Notes on MapReduce Algorithms Barna Saha 1 Finding Minimum Spanning Tree of a Dense Graph in MapReduce We are given a graph G = (V, E) on V = N vertices and E = m N 1+c edges for some constant c > 0. Our
More informationCollapsed Variational Inference for Sum-Product Networks
for Sum-Product Networks Han Zhao 1, Tameem Adel 2, Geoff Gordon 1, Brandon Amos 1 Presented by: Han Zhao Carnegie Mellon University 1, University of Amsterdam 2 June. 20th, 2016 1 / 26 Outline Background
More informationAn Introduction to Bayesian Machine Learning
1 An Introduction to Bayesian Machine Learning José Miguel Hernández-Lobato Department of Engineering, Cambridge University April 8, 2013 2 What is Machine Learning? The design of computational systems
More informationThe geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan
The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan Background: Global Optimization and Gaussian Processes The Geometry of Gaussian Processes and the Chaining Trick Algorithm
More informationBayesian Networks Inference with Probabilistic Graphical Models
4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning
More informationDirected Graphical Models or Bayesian Networks
Directed Graphical Models or Bayesian Networks Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Bayesian Networks One of the most exciting recent advancements in statistical AI Compact
More information1 Bayesian Linear Regression (BLR)
Statistical Techniques in Robotics (STR, S15) Lecture#10 (Wednesday, February 11) Lecturer: Byron Boots Gaussian Properties, Bayesian Linear Regression 1 Bayesian Linear Regression (BLR) In linear regression,
More informationIntroduction to continuous and hybrid. Bayesian networks
Introduction to continuous and hybrid Bayesian networks Joanna Ficek Supervisor: Paul Fink, M.Sc. Department of Statistics LMU January 16, 2016 Outline Introduction Gaussians Hybrid BNs Continuous children
More informationIntroduction to Graphical Models
Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic
More informationIntroduction to Probability and Statistics (Continued)
Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:
More informationOnline Algorithms for Sum-Product
Online Algorithms for Sum-Product Networks with Continuous Variables Priyank Jaini Ph.D. Seminar Consistent/Robust Tensor Decomposition And Spectral Learning Offline Bayesian Learning ADF, EP, SGD, oem
More informationCS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 Plan for today and next week Today and next time: Bayesian networks (Bishop Sec. 8.1) Conditional
More informationReview: Bayesian learning and inference
Review: Bayesian learning and inference Suppose the agent has to make decisions about the value of an unobserved query variable X based on the values of an observed evidence variable E Inference problem:
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationLearning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley
Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. Converting online to batch. Online convex optimization.
More informationDynamic Matrix-Variate Graphical Models A Synopsis 1
Proc. Valencia / ISBA 8th World Meeting on Bayesian Statistics Benidorm (Alicante, Spain), June 1st 6th, 2006 Dynamic Matrix-Variate Graphical Models A Synopsis 1 Carlos M. Carvalho & Mike West ISDS, Duke
More informationIntelligent Systems:
Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition
More informationLecture 9: PGM Learning
13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and
More information10708 Graphical Models: Homework 2
10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves
More informationAchievability of Asymptotic Minimax Regret by Horizon-Dependent and Horizon-Independent Strategies
Journal of Machine Learning Research 16 015) 1-48 Submitted 7/14; Revised 1/14; Published 8/15 Achievability of Asymptotic Minimax Regret by Horizon-Dependent and Horizon-Independent Strategies Kazuho
More informationScoring functions for learning Bayesian networks
Tópicos Avançados p. 1/48 Scoring functions for learning Bayesian networks Alexandra M. Carvalho Tópicos Avançados p. 2/48 Plan Learning Bayesian networks Scoring functions for learning Bayesian networks:
More informationDS-GA 1002 Lecture notes 11 Fall Bayesian statistics
DS-GA 100 Lecture notes 11 Fall 016 Bayesian statistics In the frequentist paradigm we model the data as realizations from a distribution that depends on deterministic parameters. In contrast, in Bayesian
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Uncertainty & Probabilities & Bandits Daniel Hennes 16.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Uncertainty Probability
More informationDeep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationStochastic prediction of train delays with dynamic Bayesian networks. Author(s): Kecman, Pavle; Corman, Francesco; Peterson, Anders; Joborn, Martin
Research Collection Other Conference Item Stochastic prediction of train delays with dynamic Bayesian networks Author(s): Kecman, Pavle; Corman, Francesco; Peterson, Anders; Joborn, Martin Publication
More informationVers un apprentissage subquadratique pour les mélanges d arbres
Vers un apprentissage subquadratique pour les mélanges d arbres F. Schnitzler 1 P. Leray 2 L. Wehenkel 1 fschnitzler@ulg.ac.be 1 Université deliège 2 Université de Nantes 10 mai 2010 F. Schnitzler (ULG)
More informationCS599: Convex and Combinatorial Optimization Fall 2013 Lecture 17: Combinatorial Problems as Linear Programs III. Instructor: Shaddin Dughmi
CS599: Convex and Combinatorial Optimization Fall 2013 Lecture 17: Combinatorial Problems as Linear Programs III Instructor: Shaddin Dughmi Announcements Today: Spanning Trees and Flows Flexibility awarded
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationOnline Learning Summer School Copenhagen 2015 Lecture 1
Online Learning Summer School Copenhagen 2015 Lecture 1 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Online Learning Shai Shalev-Shwartz (Hebrew U) OLSS Lecture
More information