Learning Bayesian network : Given structure and completely observed data

Size: px
Start display at page:

Download "Learning Bayesian network : Given structure and completely observed data"

Transcription

1 Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani

2 Learning problem Target: true distribution P that maybe correspond to M = K, θ Hypothesis space: specified probabilistic graphical models Data: set of instances sampled from P Learning goal: selecting a model M to construct the best approximation to M according to a performance metric 2

3 Learning tasks on graphical models Parameter learning / structure learning Completely observable / partially observable data Directed model / undirected model 3

4 Parameter learning in directed models Complete data We assume that the structure of the model is known consider learning parameters for a B with a given structure Goal: estimate CPDs from a dataset D = {x 1,..., x () } of independent, identically distributed (i.i.d.) training samples. Each training sample x n = x 1 (n),, xl (n) is a vector that every element x i (n) is known (no missing values, no hidden variables) 4

5 Density estimation review We use density estimation to solve this learning problem Density estimation: Estimating the probability density function P(x), given a set of data points x i drawn from it. Parametric methods: Assume that P(x) in terms of a specific functional form which has a number of adjustable parameters MLE and Bayesian estimate MLE: eed to determine θ given {x 1,, x () } MLE overfitting problem Bayesian estimate: Probability distribution P(θ) over spectrum of hypotheses eeds prior distribution on parameters i=1 5

6 Density estimation: Graphical model i.i.d assumption θ X (1) X (2) X () θ X (i) i = 1,..., α hyperparametrs α θ α θ X (1) X (2) X () X (n) n = 1,..., 6

7 Maximum Likelihood Estimation (MLE) Likelihood is the conditional probability of observations D = x (1), x (2),, x () given the value of parameters θ Assuming i.i.d. (independent, identically distributed) samples P D θ = P x (1),, x () θ = likelihood of θ w.r.t. the samples P(x (n) θ) Maximum Likelihood estimation 7 θ ML = argmax θ θ ML = argmax θ i=1 P D θ = argmax θ ln p x (i) θ P(x (n) θ) MLE has closed form solution for many parametric distributions

8 MLE: Bernoulli distribution Given: D = x (1), x (2),, x (), m heads (1), m tails (0): p x θ = θ x 1 θ 1 x p x = 1 θ = θ p D θ = p(x n θ) = x n θ 1 θ 1 x n ln p D θ = ln p(x n θ) = {x n ln θ + (1 x n ) ln 1 θ } ln p D θ θ = 0 θ ML = i=1 x (n) = m

9 MLE: Multinomial distribution Multinomial distribution (on variable with K state): Parameter space: θ = θ 1,, θ K θ i 0,1 K k=1 θ k = 1 P x θ = K x θ k k k=1 P x k = 1 = θ k θ 2 9 Variable: 1-of-K coding x = x 1,, x K x k {0,1} K k=1 x k = 1 θ 1 θ 3 θ 1 + θ 2 + θ 3 = 1 where θ i 0,1 that is a simplex showing the set of valid parameters

10 MLE: Multinomial distribution D = x (1), x (2),, x () P D θ = P(x n θ) = K k=1 θ k x k (n) = L θ, λ = ln p D θ + λ(1 θ k ) K k=1 K k=1 θ k x k (n) m k = K k=1 x k (n) m k = θ k = x k (n) = m k 10

11 MLE: Gaussian with unknown μ ln P(x n μ) = 1 2 ln 2πσ 1 2σ 2 x n μ 2 ln P D μ μ = 0 μ ln p x (n) μ = 0 1 σ 2 x(n) μ = 0 μ ML = 1 x n 12

12 Bayesian approach Parameters θ as random variables with a priori distribution utilizes the available prior information about the unknown parameter As opposed to ML estimation, it does not seek a specific point estimate of the unknown parameter vector θ Samples D convert the prior densities P θ density P θ D into a posterior Keep track of beliefs about θ s values and uses these beliefs for reaching conclusions 13

13 Maximum A Posteriori (MAP) estimation MAP estimation θ MAP = argmax p θ D θ Since p θ D p D θ p(θ) θ MAP = argmax θ p D θ p(θ) Example of prior distribution: p θ = (θ 0, σ 2 ) 14

14 Bayesian approach: Predictive distribution Given a set of samples D = x i i=1, a prior distribution on the parameters P(θ), and the form of the distribution P x θ We find P θ D and use it to specify P x = P(x D) on new data as an estimate of P(x): P x D = P x, θ D dθ = P x D, θ P θ D dθ = P x θ P θ D dθ Predictive distribution Analytical solutions exist for very special forms of the involved functions 15

15 Conjugate Priors We consider a form of prior distribution that has a simple interpretation as well as some useful analytical properties Choosing a prior such that the posterior distribution that is proportional to p(d θ)p(θ) will have the same functional form as the prior. α, D α P(θ α ) P D θ P(θ α) Having the same functional form 16

16 Prior for Bernoulli Likelihood Beta distribution over θ [0,1]: Beta θ α 1, α 0 θ α θ α 0 1 Beta θ α 1, α 0 = Γ(α 0 + α 1 ) Γ(α 0 )Γ(α 1 ) θα θ α 0 1 E θ = α 1 α 0 + α α 1 1 θ = α α 1 1 most probable θ Beta distribution is the conjugate prior of Bernoulli: P x θ = θ x 1 θ 1 x 17

17 Beta distribution 18

18 Benoulli likelihood: posterior Given: D = x (1), x (2),, x (), m heads (1), m tails (0) p θ D p D θ p(θ) = i=1 θ x i 1 θ 1 x i Beta θ α 1, α 0 θ m+α1 1 1 θ m+α 0 1 p θ D Beta θ α 1, α 0 α 1 = α 1 + m α 0 = α 0 + m θ α θ α 0 1 m = x (i) i=1 19

19 Example Prior Beta: α 0 = α 1 = 2 p x θ = θ x 1 θ 1 x Bernoulli p x = 1 θ θ Posterior Beta:α 1 = 5, α 0 = 2 Given: D = x (1), x (2),, x () : m heads (1), m tails (0) θ α 0 = α 1 = 2 20 θ θ MAP = argmax θ D = 1,1,1 = 3, m = 3 P θ D = α 1 1 α α 0 1 = 4 5

20 Benoulli: Predictive distribution Training samples: D = x (1),, x () P θ = Beta θ α 1, α 0 θ α1 1 1 θ α 0 1 P θ D = Beta θ α 1 + m, α 0 + m θ α1+m 1 1 θ α 0+ m 1 21 P x D = P x θ P θ D dθ = E P θ D P(x θ) P x = 1 D = E P θ D θ = α 1 + m α 0 + α 1 +

21 Dirichlet distribution Input space: θ = θ 1,, θ K T θ k 0,1 K k=1 θ k = 1 = P θ α Γ(α) K α θ k 1 k k=1 K α θ k 1 k Γ α 1 Γ(α K ) k=1 α = K k=1 α k E θ k = α α k θ k = α k 1 α K 22

22 Dirichlet distribution: Examples α = [0.1,0.1,0.1] α = [1,1,1] α = [10,10,10] Dirichlet parameters determine both the prior beliefs and their strength. The larger values of α correspond to more confidence on the prior belief (i.e., more imaginary samples) 23

23 Dirichlet distribution: Example α = [2,2,2] α = [20,2,2] 24

24 Multinomial distribution: Prior Dirichlet distribution is the conjugate prior of Multinomial P θ D, α P D θ P θ α K m θ k +α k 1 k k=1 m = m 1,, m K T sufficient statistics of data P θ D, α = Dir θ α + m P θ α θ~dir(α 1,, α K ) P θ D, α θ~dir(α 1 + m 1,, α K + m K ) 25

25 Multinomial: Predictive distribution Training samples: D = x (1),, x () P θ = Dir θ α 1,, α K P θ D = Dir θ α 1 + m 1,, α K + m K P x D = P x θ P θ D dθ = E P θ D P(x θ) P x k = 1 D = E P θ D θ k = α k + m k α + Larger α more confidence in our prior 26 α k + m k α + = α α + α k α + α + m k

26 Multinomial: Predictive distribution P x D = Multi α 1 + m 1 α +,, α K + m K α + Bayesian prediction combines sufficient statistics from imaginary Dirichlet samples and real data samples 27

27 Example: MLE vs. Bayesian Effect of different priors on smoothing parameter estimates θ~beta(10,10) θ~beta(1,1) 28 [Koller & Friedman Book]

28 Bayesian Estimation Gaussian distribution with unknown μ (known σ) P x μ ~(μ, σ 2 ) P(μ)~(μ 0, σ 0 2 ) P μ D P D μ P μ = P(x n μ P(μ) P μ D exp 1 2 σ σ 0 2 μ2 2 x n σ 2 + μ 0 σ 0 2 μ 29

29 Bayesian Estimation Gaussian distribution with unknown μ (known σ) P μ D ~(μ, σ 2 ) μ = 1 2 σ 0 σ σ 2 σ 2 = 1 σ σ 2 x n + σ 2 σ σ 2 μ 0 P x D = P x μ P μ D dμ exp 1 2 x μ 2 σ 2 + σ 2 P x D ~(μ, σ 2 + σ 2 ) 30

30 Conjugate prior for Gaussian distribution Known μ, unknown σ 2 The conjugate prior for λ = 1/σ 2 is a Gamma with shape a 0 and rate (inverse scale) b 0 The conjugate prior for σ 2 is Inverse-Gamma Unknown μ and unknown σ 2 The conjugate prior P μ, λ is ormal-gamma Multivariate case: The conjugate prior P μ, Λ is ormal-wishart Gam λ a, b = 1 Γ(a) ba x a 1 exp bλ InvGam σ 2 a, b = 1 Γ(a) ba x a 1 exp( b σ 2) P μ, λ = μ μ 0, βλ 1 Gam (λ a 0, b 0 ) 31

31 Bayesian estimation: Summary Bayesian approach treats parameters as random variables Learning is then a special case of inference Asymptotically equivalent to MLE For some parametric distributions, has closed form (that is obtained based on prior parameters and sufficient statistics from data) 32

32 Learning in Bayesian networks Learning MLE Likelihood decomposes on nodes conditional distributions Bayesian We can make some assumptions (global & local independencies) to simplify the learning 33

33 Likelihood decomposition: Example Directed factorization causes likelihood to locally decompose: P X θ = P X 1 θ 1 P X 2 X 1, θ 2 P X 3 X 1, θ 3 P X 4 X 2, X 3, θ 4 X 1 X 2 X 3 X 4 34

34 Decomposition of the likelihood If we assume the parameters for each CPD are disjoint (i.e., disjoint parameter sets θ Xi Pa Xi for different variables), and the data is complete, then the log-likelihood function decomposes on nodes: L θ; D = P D θ = = i V i V (n) (n) P X i PaXi, θi = L i θ i ; D = (n) (n) P X i PaXi, θi (n) P X (n) i PaXi, θ i L i (θ i ; D) i V 35 θ i = argmax L i θ i ; D θ = [ θ 1,, θ V ] θ i θ = argmax L θ; D θ θ i θ Xi Pa Xi

35 Local decomposition of the likelihood: Table-CPDs The choice of parameters given different value assignment to the parents are independent of each other Consider only data points that agree with the parent assignment: L i θ Xi Pa Xi ; D = θ (n) (n) Xi PaXi = u Val(Pa Xi ) x Val(X i ) θ x u (n) (n) I X i =x,paxi =u 36

36 Local decomposition of the likelihood: Table-CPDs Each row is multinomial distribution θ i θ Xi Pa Xi θ Xi u 1 θ Xi u L Pa Xi = u 1 Pa Xi = u L X i = 1 X i = K i θ 1 u1 i θ K u1 i θ 1 ul i θ K ul m i k, u = (n) (n) I X i = k, PaXi = u u k i θ k u = 1 i θ k u = m i k, u k m i k, u = (n) (n) I X i = k, PaXi = u (n) I Pa Xi = u 37

37 Overfitting For large instances Z Pa Xi Val(Z), most buckets will have very few umber of parameters will increases exponentially with Pa Xi Poor estimation With limited data, overfitting occurs we often get better generalization with simpler structures zero count problem or the sparse data problem e.g., consider a naïve bayes classifier and an d containing a word that does not occur in any training doc for the class c, then P d c = 0 38

38 ML Example: aïve Bayes classifier Training phase π Y (n) Y {1,, C} θ i = θ Xi Y = θ Xi 1,, θ Xi C Test phase θ i = θ Xi 1,, θ Xi C π Y θ i X i (n) n = 1,, X i i = 1,.., D π i = 1,, D θ 1 θ 2 Y θ D X 1 X 2 X D 39

39 ML Example: aïve Bayes classifier L θ; D = P(Y n π) D i=1 P(X i n Y n, θ Xi Y) π c = I(Y n = c) Discrete inputs or features (X i {1,, K}): i θ k c = p X i = k Y = c I(X i n =k,y n =c) I(Y n =c) 40

40 Recall: Maximum conditional likelihood Example: discriminative classifier eeds to learn the conditional distribution (not joint distribution) Given D = x n, y n and a parametric conditional distribution for P(y x; θ), we find: θ = argmax θ P(y n x n ; θ) We will also see maximum conditional likelihood for more general CPDs than tabular ones 41

41 Maximum conditional likelihood Linear regression P y x, w = y w T φ x, σ 2 Linear: φ x = [1, x 1,, x D ] D = x n, y n x w y L Y X w; D = w = argmax θ P Y (n) X n, w = Y n w T φ X n, σ 2 Y n w T φ X n, σ 2 = Φ T Φ 1 Φ T Y 42

42 Maximum conditional likelihood Logistic regression P y = 1 X = σ w T φ x Linear LR: φ x = [1, x 1,, x D ] P y x, w = Ber(y σ w T φ x ) D = x n, y n Y {0,1} w x y L Y X w, D = 43 = P y (n) x n, w = Ber y n σ w T φ x n σ w T φ x n y (n) 1 σ w T φ x n 1 y (n) σ z = exp z

43 MLE for Bayesian networks: Summary For a B with disjoint sets of parameters in CPDs, likelihood decomposes as product of local likelihood functions on nodes Thus, we optimize likelihoods on different nodes separately For table CPDs, local likelihood further decomposes as product of likelihood for multinomials, one for each parent combination Sparse data problem of MLE 44

44 Bayesian learning for Bs Global parameter independence Global parameter independence assumption: θ 1 P θ D = 1 P D = P θ = L i θ i ; D = P θ i D i V P D θ = P(θ i ) i V P D θ P θ = 1 P D (n) P X (n) i PaXi, θ i L i (θ i ; D) i V θ 2 L i θ i ; D P θ i i V Posteriors of θ are independent given complete data and independent priors X 2 (n) X 1 (n) X 3 (n) θ 3 θ i θ Xi Pa Xi 45

45 Bayesian learning for Bs Global parameter independence How can we find directly from the structure of the Bayesian network when P θ satisfies global independence? 46

46 Bayesian learning for Bs Global parameter independence Predictive distribution P x D = P x θ, D P θ D dθ P x D = P x θ P θ D dθ Instances are independent given the parameters (for complete data) = i V P x i Pa Xi, θ i P θ i D dθ i V = i V P x i Pa Xi, θ i P θ i D dθ i Solve the prediction problem for each CPD independently and then combine the results. 47 θ i θ Xi Pa Xi f x g y dxdy = f x dx g y dy

47 Local decomposition for table CPDs Prior P θ X PaX satisfies local parameter independence assumption if: P θ X PaX = P θ X u Pa X =u The groups of parameters in different rows of a CPD are independent The posterior of P θ X u D and P θ X u D are independent despite v structure on X, because Pa X acts like a multiplexer Y (n) Y {1,, C} θ X 1 θ X C 48 X (n) n = 1,,

48 Global and local parameter independence Let the data be complete and CPDs be tabular. If the prior P(θ) satisfies global and local parameter independence, then: Pa Xi = u K = Val(X i ) P θ D = i V P X i = k u, D = Pa Xi P(θ Xi Pa Xi D) i α k u K k =1 + m i (k, u) i α k u + m i (k, u) i θ Xi u~dir(α 1 u i,, α K u ) i θ Xi u D ~Dir(α 1 u i + m i (1, u),, α K u + m i (K, u)) 49

49 Priors for Bayesian learning Priors α Xi Pa Xi must be defined: α x u x Val X i, u Val(Pa Xi ) K 2 prior: a fixed prior for all hyper-parameters α x u = 1 We can store an α and a prior distribution P on the variables in the network: α Xi Pa Xi = αp X i, Pa Xi BDe prior: Define P as a set of independent marginals over X i s 50

50 aïve Bayes Example: Bayesian learning Y {1,, C} θ i = θ Xi Y = θ Xi 1,, θ Xi C Training phase α π α Xi c Y (n) θ Xi c c = 1,, C X i (n) i = 1,, D n = 1,, 51

51 aïve Bayes Example: Bayesian learning Y~Mult π (i.e., P Y = c π = π c ) P π α 1,, α C = Dir(π α 1,, α C ) P π D, α 1,, α C = Dir π α 1 + m 1,, α C + m c P Y = c D = C c =1 α c + m c α c + m c Discrete inputs or features (X i {1,, K}): P X i = k Y = c, D = K k =1 i α k c +m i (k,c) i α k c +m i (k,c) m c = I(Y n = c) m i k, c = I(X i n = k, Y n = c) 52

52 Global & local independence For nodes with no parents, parameters define a single distribution Bayesian or ML learning can be used as in the simple density estimation on a variable More generally, for tabular CPDs there are multiple categorical distributions per node, one for every combination of parent variables Learning objective decomposes into multiple terms, one for subset of training data with each parent configuration Apply independent Bayesian or ML learning to each 53

53 Shared parameters Sharing CPDs or sharing parameters within a single CPD: P X i Pa Xi, θ = P X j Pa Xj, θ P X i u 1, θ i s = P X j u 2, θ i s MLE: For networks with shared CPDs, sufficient statistics accumulate over all uses of CPD Aggregate sufficient statistics: collect all the instances generated from the same conditional distribution and combine sufficient statistics from them 54

54 Markov chain P X 1,, X T θ = P X 1 π T t=2 P(X t X t 1, A t 1,t ) X 1 X 2 X T 1 X T Shared Parameters: P X 1,, X T θ = P X 1 π T t=2 P(X t X t 1, A) 55

55 Markov chain π A = θ S S Initial state probability: X 1 X 2 X T 1 X T π i = P X 1 = i, 1 i K State transition probability: A ji = θ j i = P X t+1 = i X t = j, K i=1 A ji = 1 1 i, j K 56

56 Markov chain: Parameter learning by MLE P D θ = P X 1 (n) π T t=2 (n) (n) P(X t Xt 1, A) A ji = T (n) (n) t=2 I X t 1 = j, Xt = i T (n) t=2 I X t 1 = j π i = I X 1 (n) = i 57

57 Markov chain: Bayesian learning Dirichlet prior α on A A ji = P X t+1 = i X t = j A j,. ~Dir(α j,1,, α j,k ) Assign a Dirichlet prior to each row of the transition matrix A P X t+1 = i X t = j, D, α j,. = T (n) (n) t=2 I X t 1 = j, Xt = i + αj,i T (n) t=2 I X t 1 = j + K i =1 α j,i 58

58 Hyper-parameters in Bayesian approach We already consider either parameter independencies or parameter sharing Hierarchical priors gives us a flexible language to introduce dependencies in the priors over parameters. useful when we have small amount of examples relevant to each parameter but we believe that some parameters are reasonably similar. In such situations, hierarchical priors spread the effect of the observations between parameters with shared hyper-parameters. However, when we use hyper-parameters we transformed our learning problem into one that includes a hidden variable. 59

59 Hyper-parameters: Example [Koller & Friedman Book] The effect of the prior will be to shift both θ Y x 0 and θ Y x 1 to be closer to each other 60

60 Hyper-parameters: Example [Koller & Friedman Book] If we believe that two variables Y and Z depend on X in a similar (but not identical) way 61

61 References Koller and Friedman, Chapter

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1

More information

Density Estimation: ML, MAP, Bayesian estimation

Density Estimation: ML, MAP, Bayesian estimation Density Estimation: ML, MAP, Bayesian estimation CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Maximum-Likelihood Estimation Maximum

More information

Bayesian Models in Machine Learning

Bayesian Models in Machine Learning Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Parameter Estimation December 14, 2015 Overview 1 Motivation 2 3 4 What did we have so far? 1 Representations: how do we model the problem? (directed/undirected). 2 Inference: given a model and partially

More information

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008 Readings: K&F: 16.3, 16.4, 17.3 Bayesian Param. Learning Bayesian Structure Learning Graphical Models 10708 Carlos Guestrin Carnegie Mellon University October 6 th, 2008 10-708 Carlos Guestrin 2006-2008

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Overview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58

Overview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58 Overview of Course So far, we have studied The concept of Bayesian network Independence and Separation in Bayesian networks Inference in Bayesian networks The rest of the course: Data analysis using Bayesian

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

An Introduction to Bayesian Machine Learning

An Introduction to Bayesian Machine Learning 1 An Introduction to Bayesian Machine Learning José Miguel Hernández-Lobato Department of Engineering, Cambridge University April 8, 2013 2 What is Machine Learning? The design of computational systems

More information

4: Parameter Estimation in Fully Observed BNs

4: Parameter Estimation in Fully Observed BNs 10-708: Probabilistic Graphical Models 10-708, Spring 2015 4: Parameter Estimation in Fully Observed BNs Lecturer: Eric P. Xing Scribes: Purvasha Charavarti, Natalie Klein, Dipan Pal 1 Learning Graphical

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 5 Bayesian Learning of Bayesian Networks CS/CNS/EE 155 Andreas Krause Announcements Recitations: Every Tuesday 4-5:30 in 243 Annenberg Homework 1 out. Due in class

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

More information

Hierarchical Models & Bayesian Model Selection

Hierarchical Models & Bayesian Model Selection Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Contact information Please report any typos or

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,

More information

Bayesian RL Seminar. Chris Mansley September 9, 2008

Bayesian RL Seminar. Chris Mansley September 9, 2008 Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in

More information

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824 Naïve Bayes Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative HW 1 out today. Please start early! Office hours Chen: Wed 4pm-5pm Shih-Yang: Fri 3pm-4pm Location: Whittemore 266

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Bayesian Learning. Machine Learning Fall 2018 Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Empirical Bayes, Hierarchical Bayes Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 5: Due April 10. Project description on Piazza. Final details coming

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

Introduction to Probability and Statistics (Continued)

Introduction to Probability and Statistics (Continued) Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning

More information

Lecture 4. Generative Models for Discrete Data - Part 3. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza.

Lecture 4. Generative Models for Discrete Data - Part 3. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. Lecture 4 Generative Models for Discrete Data - Part 3 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza October 6, 2017 Luigi Freda ( La Sapienza University) Lecture 4 October 6, 2017 1 / 46 Outline

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

CS-E3210 Machine Learning: Basic Principles

CS-E3210 Machine Learning: Basic Principles CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 61 Today s introduction

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Introduc)on to Bayesian methods (con)nued) - Lecture 16

Introduc)on to Bayesian methods (con)nued) - Lecture 16 Introduc)on to Bayesian methods (con)nued) - Lecture 16 David Sontag New York University Slides adapted from Luke Zettlemoyer, Carlos Guestrin, Dan Klein, and Vibhav Gogate Outline of lectures Review of

More information

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1) HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Generative Models Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Machine Learning, Fall 2012 Homework 2

Machine Learning, Fall 2012 Homework 2 0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Lecture 2: Priors and Conjugacy

Lecture 2: Priors and Conjugacy Lecture 2: Priors and Conjugacy Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 6, 2014 Some nice courses Fred A. Hamprecht (Heidelberg U.) https://www.youtube.com/watch?v=j66rrnzzkow Michael I.

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide

More information

Computational Cognitive Science

Computational Cognitive Science Computational Cognitive Science Lecture 8: Frank Keller School of Informatics University of Edinburgh keller@inf.ed.ac.uk Based on slides by Sharon Goldwater October 14, 2016 Frank Keller Computational

More information

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 3 September 14, Readings: Mitchell Ch Murphy Ch.

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 3 September 14, Readings: Mitchell Ch Murphy Ch. School of Computer Science 10-701 Introduction to Machine Learning aïve Bayes Readings: Mitchell Ch. 6.1 6.10 Murphy Ch. 3 Matt Gormley Lecture 3 September 14, 2016 1 Homewor 1: due 9/26/16 Project Proposal:

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

NPFL108 Bayesian inference. Introduction. Filip Jurčíček. Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic

NPFL108 Bayesian inference. Introduction. Filip Jurčíček. Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic NPFL108 Bayesian inference Introduction Filip Jurčíček Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic Home page: http://ufal.mff.cuni.cz/~jurcicek Version: 21/02/2014

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

{ p if x = 1 1 p if x = 0

{ p if x = 1 1 p if x = 0 Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =

More information

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Logistics CSE 446: Point Estimation Winter 2012 PS2 out shortly Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Last Time Random variables, distributions Marginal, joint & conditional

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Representation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2

Representation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2 Representation Stefano Ermon, Aditya Grover Stanford University Lecture 2 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 2 1 / 32 Learning a generative model We are given a training

More information

The Monte Carlo Method: Bayesian Networks

The Monte Carlo Method: Bayesian Networks The Method: Bayesian Networks Dieter W. Heermann Methods 2009 Dieter W. Heermann ( Methods)The Method: Bayesian Networks 2009 1 / 18 Outline 1 Bayesian Networks 2 Gene Expression Data 3 Bayesian Networks

More information

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014 Probability Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh August 2014 (All of the slides in this course have been adapted from previous versions

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Introduction to Probabilistic Graphical Models

Introduction to Probabilistic Graphical Models Introduction to Probabilistic Graphical Models Sargur Srihari srihari@cedar.buffalo.edu 1 Topics 1. What are probabilistic graphical models (PGMs) 2. Use of PGMs Engineering and AI 3. Directionality in

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

MLE/MAP + Naïve Bayes

MLE/MAP + Naïve Bayes 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes MLE / MAP Readings: Estimating Probabilities (Mitchell, 2016)

More information

Introduction to Bayesian inference

Introduction to Bayesian inference Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 4 Learning Bayesian Networks CS/CNS/EE 155 Andreas Krause Announcements Another TA: Hongchao Zhou Please fill out the questionnaire about recitations Homework 1 out.

More information

Introduction to Machine Learning

Introduction to Machine Learning Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

Statistical learning. Chapter 20, Sections 1 3 1

Statistical learning. Chapter 20, Sections 1 3 1 Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2017. Tom M. Mitchell. All rights reserved. *DRAFT OF September 16, 2017* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is

More information

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single

More information

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to

More information

Graphical Models. Andrea Passerini Statistical relational learning. Graphical Models

Graphical Models. Andrea Passerini Statistical relational learning. Graphical Models Andrea Passerini passerini@disi.unitn.it Statistical relational learning Probability distributions Bernoulli distribution Two possible values (outcomes): 1 (success), 0 (failure). Parameters: p probability

More information

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012) Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation

More information

Announcements. Proposals graded

Announcements. Proposals graded Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Non-parametric Methods

Non-parametric Methods Non-parametric Methods Machine Learning Alireza Ghane Non-Parametric Methods Alireza Ghane / Torsten Möller 1 Outline Machine Learning: What, Why, and How? Curve Fitting: (e.g.) Regression and Model Selection

More information

Statistical learning. Chapter 20, Sections 1 4 1

Statistical learning. Chapter 20, Sections 1 4 1 Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Naïve Bayes Matt Gormley Lecture 18 Oct. 31, 2018 1 Reminders Homework 6: PAC Learning

More information

Probabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April

Probabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April Probabilistic Graphical Models Guest Lecture by Narges Razavian Machine Learning Class April 14 2017 Today What is probabilistic graphical model and why it is useful? Bayesian Networks Basic Inference

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models CPSC

More information

Probabilistic Graphical Models (I)

Probabilistic Graphical Models (I) Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random

More information

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007 Sequential Monte Carlo and Particle Filtering Frank Wood Gatsby, November 2007 Importance Sampling Recall: Let s say that we want to compute some expectation (integral) E p [f] = p(x)f(x)dx and we remember

More information

Lecture 2: Conjugate priors

Lecture 2: Conjugate priors (Spring ʼ) Lecture : Conjugate priors Julia Hockenmaier juliahmr@illinois.edu Siebel Center http://www.cs.uiuc.edu/class/sp/cs98jhm The binomial distribution If p is the probability of heads, the probability

More information

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018 Bayesian Methods David S. Rosenberg New York University March 20, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 March 20, 2018 1 / 38 Contents 1 Classical Statistics 2 Bayesian

More information

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4 ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information