CSE 150. Assignment 6 Summer Maximum likelihood estimation. Out: Thu Jul 14 Due: Tue Jul 19

Similar documents
CSE 250a. Assignment Noisy-OR model. Out: Tue Oct 26 Due: Tue Nov 2

Midterm 2 V1. Introduction to Artificial Intelligence. CS 188 Spring 2015

Mixtures of Gaussians. Sargur Srihari

Introduction to Machine Learning Midterm, Tues April 8

Notes on Machine Learning for and

Finite Singular Multivariate Gaussian Mixture

STA 414/2104: Machine Learning

STA 4273H: Statistical Machine Learning

Adaptive Crowdsourcing via EM with Prior

ECE521 week 3: 23/26 January 2017

Final Examination CS540-2: Introduction to Artificial Intelligence

Study Notes on the Latent Dirichlet Allocation

FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE

CSL302/612 Artificial Intelligence End-Semester Exam 120 Minutes

Machine Learning. Linear Models. Fabio Vandin October 10, 2017

Lecture 12: Algorithms for HMMs

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

10-701/15-781, Machine Learning: Homework 4

Series 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning)

Final Exam. 1 True or False (15 Points)

Bayesian Networks. Motivation

Restricted Boltzmann Machines for Collaborative Filtering

Notes on Markov Networks

Midterm II. Introduction to Artificial Intelligence. CS 188 Spring ˆ You have approximately 1 hour and 50 minutes.

CSC321 Lecture 18: Learning Probabilistic Models

Expectation Maximization (EM)

Midterm, Fall 2003

Statistical learning. Chapter 20, Sections 1 4 1

Midterm II. Introduction to Artificial Intelligence. CS 188 Fall ˆ You have approximately 3 hours.

Note Set 5: Hidden Markov Models

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

Qualifier: CS 6375 Machine Learning Spring 2015

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

, and rewards and transition matrices as shown below:

Statistical Pattern Recognition

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Reading Group on Deep Learning Session 1

Undirected Graphical Models

6.864: Lecture 5 (September 22nd, 2005) The EM Algorithm

10708 Graphical Models: Homework 2

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Learning From Crowds. Presented by: Bei Peng 03/24/15

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

2- AUTOASSOCIATIVE NET - The feedforward autoassociative net considered in this section is a special case of the heteroassociative net.

Ways to make neural networks generalize better

CS6220: DATA MINING TECHNIQUES

Lecture 12: Algorithms for HMMs

Statistical approach for dictionary learning

p(d θ ) l(θ ) 1.2 x x x

Mobile Robot Localization

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Programming Assignment 4: Image Completion using Mixture of Bernoullis

Final. Introduction to Artificial Intelligence. CS 188 Spring You have approximately 2 hours and 50 minutes.

Artificial Neural Networks

Final Exam, Fall 2002

STA 4273H: Statistical Machine Learning

Introduction to Bayesian Learning. Machine Learning Fall 2018

Machine Learning, Midterm Exam: Spring 2008 SOLUTIONS. Q Topic Max. Score Score. 1 Short answer questions 20.

1 Using standard errors when comparing estimated values

CS446: Machine Learning Fall Final Exam. December 6 th, 2016

Probabilistic Graphical Models

Discrete Probability and State Estimation

Name: Matriculation Number: Tutorial Group: A B C D E

EM-algorithm for motif discovery

FINAL: CS 6375 (Machine Learning) Fall 2014

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

CSC411 Fall 2018 Homework 5

Introduction to Machine Learning

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks

ECE521 Lecture7. Logistic Regression

STA 4273H: Statistical Machine Learning

p L yi z n m x N n xi

Hidden Markov Models: Maxing and Summing

13 : Variational Inference: Loopy Belief Propagation and Mean Field

Bayesian Machine Learning - Lecture 7

LEARNING WITH BAYESIAN NETWORKS

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Lecture 8: December 17, 2003

The Expectation Maximization Algorithm

Lecture 4: Probabilistic Learning

Hidden Markov models

Probabilistic Graphical Models

MIXTURE MODELS AND EM

MODULE -4 BAYEIAN LEARNING

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

CS 343: Artificial Intelligence

Learning Bayesian Networks (part 1) Goals for the lecture

Machine Learning Summer School

Today. Calculus. Linear Regression. Lagrange Multipliers

Bayesian Networks BY: MOHAMAD ALSABBAGH

Introduction to Bayesian Learning

Introduction to Artificial Intelligence Midterm 2. CS 188 Spring You have approximately 2 hours and 50 minutes.

Expectation Propagation Algorithm

CS 234 Midterm - Winter

Statistical Machine Learning from Data

Final. CS 188 Fall Introduction to Artificial Intelligence

CSC 411: Lecture 04: Logistic Regression

Transcription:

SE 150. Assignment 6 Summer 2016 Out: Thu Jul 14 ue: Tue Jul 19 6.1 Maximum likelihood estimation A (a) omplete data onsider a complete data set of i.i.d. examples {a t, b t, c t, d t } T t=1 drawn from the joint distribution of the above belief network. ompute the maximum likelihood estimates of the conditional probability tables (PTs) shown below for this data set. Express your answers in terms of indicator functions, such as: { 1 if a = at, I(a, a t ) = 0 if a a t. For example, in terms of this indicator function, the maximum likelihood estimate for the PT at node A is given by P (A = a) = 1 T Tt=1 I(a, a t ). omplete the numerators and denominators in the below expressions. P ( =b A=a) = P ( =c A=a, =b) = P ( =d =b, =c) = 1

A (b) Posterior probability onsider the belief network shown above, with observed nodes and and hidden nodes A and. ompute the posterior probability P (a, c b, d) in terms of the PTs of the belief network that is, in terms of P (a), P (b a), P (c a, b) and P (d b, c). (c) Posterior probability ompute the posterior probabilities P (a b, d) and P (c b, d) in terms of your answer from part (b). In other words, in this problem, you may assume that P (a, c b, d) is given. (d) Log-likelihood onsider a partially complete data set of i.i.d. examples {b t, d t } T t=1 drawn from the joint distribution of the above belief network. The log-likelihood of the data set is given by: L = t log P ( =b t, =d t ). ompute this log-likelihood in terms of the PTs of the belief network. You may re-use work from earlier parts of the problem. 2

A (e) EM algorithm The posterior probabilities from part (b) can be used by an EM algorithm to estimate PTs that maximize the log-likelihood from part (c). omplete the numerator and denominator in the below expressions for the EM update rules. Simplify your answers as much as possible, expressing them in terms of the posterior probabilities P (a, c b t, d t ), P (a b t, d t ), and P (c b t, d t ), as well as the indicator functions I(b, b t ), and I(d, d t ). P (A=a) P ( =b A=a) P ( =c A=a, =b) P ( =d =b, =c) 3

6.2 EM algorithm for noisy-or onsider the belief network on the right, with binary random variables X {0, 1} n and Y {0, 1} and a noisy-or conditional probability table (PT). The noisy-or PT is given by: X 1 X 2 X 3 X n n P (Y = 1 X) = 1 (1 p i ) X i, i=1 which is expressed in terms of the noisy-or parameters p i [0, 1]. In this problem, you will use the EM algorithm derived in class for estimating the noisy-or parameters p i. For a data set {( x t, y t )} T t=1, the (normalized) conditional log-likelihood is given by: Y L = 1 T T log P (Y =y t X = x t ). t=1 ownload the data files on the course web site, and use the EM algorithm to estimate the parameters p i. The data set has T =267 examples over n=23 inputs. For those interested, more information about this data set is available here: http://archive.ics.uci.edu/ml/datasets/spet+heart However, be sure to use the data files provided on the course web site, as they have been specially assembled for this assignment. The EM update for this model is given by: p i 1 T i T t=1 y t x it p i 1 n j=1 (1 p j ) x jt, where T i is the number of examples in which X i = 1. Initialize all p i = 1 2n and perform 512 iterations of the EM algorithm. At each iteration, compute the conditional log-likelihood shown above. (If you have implemented the EM algorithm correctly, this conditional log-likelihood will always increase from one iteration to the next.) Also compute the number of mistakes M T made by the model at each iteration; a mistake occurs either when y t = 0 and P (y t = 1 x t ) 0.5 (indicating a false positive) or when y t = 1 and P (y t = 1 x t ) 0.5 (indicating a false negative). The number of mistakes should generally decrease as the model is trained, though it is not guaranteed to do so at each iteration. Turn in all of the following: (a) a hard-copy print-out of your source code (b) a plot or print-out of your final estimated values for p i (c) a completed version of the following table 4

iteration number of mistakes M conditional log-likelihood L 0 212-1.5123 1 67 2-0.4192 4 8 16 32 64 128 36 256 512-0.3100 You should use the already completed entries of this table to check your work. As always you may program in the language of your choice. 5