Course 395: Machine Learning - Lectures

Similar documents
MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation

Machine learning: Density estimation

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

The Expectation-Maximization Algorithm

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

EM and Structure Learning

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

Expectation Maximization Mixture Models HMMs

Semi-Supervised Learning

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

} Often, when learning, we deal with uncertainty:

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Lecture Notes on Linear Regression

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

10-701/ Machine Learning, Fall 2005 Homework 3

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing

Hidden Markov Models

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

SDMML HT MSc Problem Sheet 4

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Ensemble Methods: Boosting

Clustering & Unsupervised Learning

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

Retrieval Models: Language models

Clustering & (Ken Kreutz-Delgado) UCSD

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Motion Perception Under Uncertainty. Hongjing Lu Department of Psychology University of Hong Kong

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

Composite Hypotheses testing

Using deep belief network modelling to characterize differences in brain morphometry in schizophrenia

Evaluation for sets of classes

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

First Year Examination Department of Statistics, University of Florida

Note on EM-training of IBM-model 1

Expected Value and Variance

Mixture of Gaussians Expectation Maximization (EM) Part 2

Chapter 1. Probability

COS 511: Theoretical Machine Learning

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

1 Convex Optimization

Goodness of fit and Wilks theorem

Online Appendix to The Allocation of Talent and U.S. Economic Growth

Computing MLE Bias Empirically

3.1 ML and Empirical Distribution

Hidden Markov Models

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Hidden Markov Model Cheat Sheet

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

The Basic Idea of EM

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

TAIL BOUNDS FOR SUMS OF GEOMETRIC AND EXPONENTIAL VARIABLES

CSC401/2511 Spring CSC401/2511 Natural Language Computing Spring 2019 Lecture 5 Frank Rudzicz and Chloé Pou-Prom University of Toronto

Limited Dependent Variables

Hopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen

Expectation Maximization Mixture Models

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

Generative and Discriminative Models. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30

1/10/18. Definitions. Probabilistic models. Why probabilistic models. Example: a fair 6-sided dice. Probability

Engineering Risk Benefit Analysis

Multilayer Perceptron (MLP)

Linear Regression Analysis: Terminology and Notation

Speech and Language Processing

Statistics and Probability Theory in Civil, Surveying and Environmental Engineering

Generative classification models

Boostrapaggregating (Bagging)

Hidden Markov Models

EEE 241: Linear Systems

CHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

STAT 405 BIOSTATISTICS (Fall 2016) Handout 15 Introduction to Logistic Regression

The Expectation-Maximisation Algorithm

6 Supplementary Materials

Lecture 10 Support Vector Machines. Oct

Learning with Partially Observed Data

I529: Machine Learning in Bioinformatics (Spring 2017) Markov Models

The EM Algorithm (Dempster, Laird, Rubin 1977) The missing data or incomplete data setting: ODL(φ;Y ) = [Y;φ] = [Y X,φ][X φ] = X

Notes on Frequency Estimation in Data Streams

= z 20 z n. (k 20) + 4 z k = 4

Maximum Likelihood Estimation (MLE)

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Stat 642, Lecture notes for 01/27/ d i = 1 t. n i t nj. n j

Vapnik-Chervonenkis theory

Learning undirected Models. Instructor: Su-In Lee University of Washington, Seattle. Mean Field Approximation

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1

Lecture 4 Hypothesis Testing

Advanced Statistical Methods: Beyond Linear Regression

1 Definition of Rademacher Complexity

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

6. Stochastic processes (2)

Lecture Nov

6. Stochastic processes (2)

Lecture 4: Universal Hash Functions/Streaming Cont d

Transcription:

Course 395: Machne Learnng - Lectures Lecture 1-2: Concept Learnng (M. Pantc Lecture 3-4: Decson Trees & CC Intro (M. Pantc Lecture 5-6: Artfcal Neural Networks (S.Zaferou Lecture 7-8: Instance ased Learnng (M. Pantc Lecture 9-10: Genetc Algorthms (M. Pantc Lecture 11-12: Evaluatng Hypotheses (THs Lecture 13-14: ayesan Learnng-ML Estmaton (S. Zaferou Lecture 15-16: Expectaton Maxmaton (S. Zaferou Lecture 17-18: Inductve Logc Programmng (S. Muggleton

ayesan Learnng Expectaton Maxmaton Readng: Sldes

ML estmaton Consder a set and a model f D {( x, y1,...,( x n, y 1 n } Maxmum lkelhood estmaton s gven * by: f arg max f p( D f

ML estmaton Assumng that the samples are condtonal ndependent gven f f * arg max f n 1 p( y f Further assumng y f ( x + e y f ~ N( f ( x, σ f * arg max f n 1 1 2 2πσ e 1 2σ 2 ( y f ( x 2

ML estmaton

ML estmaton Choosng to maxme ts logarthm we get f * n 1 1 arg max f (ln 2 2 1 2πσ 2πσ ( y f ( x removng the constant terms we get 2 * f argmn ( f ( x y f n 1 2

ML: smple example Consder a con flppng experment. Par of cons A and of unknown bases θα and A lands on tal wth P 1 θα We want to estmate: Α θ ( θ, θ θ A

ML: smple example A Randomly choose one (wth equal probablty. Perform 10 ndependent tosses (50 con tosses n total.

ML: smple example # (number heads observed durng the -th set of tosses x x1, x2,..., x ( 5 x {0,1,...,10} dentty of the con 1, 2,..., ( 5 { A, }

ML: smple example : # of heads usng con A HA FA : total # of flps usng con A θ Α H F A A θ H F Maxmum Lkelhood estmaton maxmes log P(, x θ

ML: smple example P ( D θ θ H A (1 TA H Α θα θ (1 θ T log P( D θ logθa + Alog(1 θα HA T + H logθ + T log(1 θ log P θα 0 θ Α Η ΗΑ + Τ Α Α

ML: smple example

Expectaton Maxmaton Consder a more challengng setup: We are gven x ( x1, x2,..., x5 latent (or hdden varables. but not Computng proportons of heads for each con s no longer possble

Expectaton Maxmaton t ( t Start wth some ntal parameters θ ( θ A and determne for each of the fve sets whether con A or con was more lkely to have generated the observed flp. Assumng ths data completon s correct apply regular maxmum lkelhood to get Do t untl convergence. ˆ ( ˆ, ˆ ( t θ θˆ ( t+ 1

Expectaton Maxmaton Compute probabltes of all possble completons gven θˆ ( t (not just the most probable one These probabltes are used to create a weghted tranng set consstng of all completons. A modfed ML that deals wth weghted tranng data s appled n order to get the new estmate θˆ ( t+ 1

Expectaton Maxmaton y usng weghted tranng examples rather than choosng the sngle best completon the EM algorthm accounts for the confdence of the model n each completon of the data

Expectaton Maxmaton P( D, θ P( P( D, θ A A P( D A P( D, θ 0.6 H A 0.4 T A A 0.0007962624 P( A 0.5

Expectaton Maxmaton P( D, θ P( D, θ P( D P( P( D, θ 0.5 H 0.5 T 0.0009765625 P( 0.5

0.45, (, (, ( + θ θ θ A A A D P D P D P w 0.55, (, (, ( + θ θ θ A D P D P D P w Expectaton Maxmaton

Expectaton Maxmaton EM Algorthm: E-step: Guessng a probablty dstrbuton over completons of mssng data gven the current model M-step: Re-estmate the parameters model gven these completons.

Expectaton Maxmaton

EM: Mathematcs Startng from an ntal parameters the E-step constructs a functon that lower-bounds. In the M-step, s computed as the maxmum of. In the next E-step, a new lower-bound s constructed; maxmaton of gves and so on

EM: Mathematcs The EM algorthm derves from the fact that for all pd (1 where the nequalty s tght when. Jensen s nequalty for all concave functon (e.g., The above holds by lettng

EM: Mathematcs Now, consderng the update rule, Where Applyng the tghtness condtons (1. Moreover, and snce (. guarantees that s a lower bound of. Update monotonc mprovement of ML for ncomplete data

Course 395: Machne Learnng - Lectures Lecture 1-2: Concept Learnng (M. Pantc Lecture 3-4: Decson Trees & CC Intro (M. Pantc Lecture 5-6: Artfcal Neural Networks (S.Zaferou Lecture 7-8: Instance ased Learnng (M. Pantc Lecture 9-10: Genetc Algorthms (M. Pantc Lecture 11-12: Evaluatng Hypotheses (THs Lecture 13-14: ayesan Learnng ML Estmaton (S. Zaferou Lecture 15-16: Expectaton Maxmaton (S. Zaferou Lecture 17-18: Inductve Logc Programmng (S. Muggleton