Expectation maximization

Size: px
Start display at page:

Download "Expectation maximization"

Transcription

1 Motivatio Expectatio maximizatio Subhrasu Maji CMSCI 689: Machie Learig 14 April 015 Suppose you are builig a aive Bayes spam classifier. After your are oe your boss tells you that there is o moey to label the ata. You have a probabilistic moel that assumes labelle ata, but you o't have ay labels. Ca you still o somethig? Amazigly you ca Treat the labels as hie variables a try to lear them simultaeously alog with the parameters of the moel Expectatio Maximizatio (EM) A broa family of algorithms for solvig hie variable problems I toay s lecture we will erive EM algorithms for clusterig a aive Bayes classificatio a lear why EM works /19 Gaussia mixture moel for clusterig Suppose ata comes from a Gaussia Mixture Moel (GMM) you have K clusters a the ata from the cluster k is raw from a Gaussia with mea μk a variace σk We will assume that the ata comes with labels (we will soo remove this assumptio) Geerative story of the ata: For each example = 1,,.., Choose a label y Mult( 1,,..., K ) Choose example x (µ k, k) Likelihoo of the ata: p(d) = p(y )p(x y ) = y (x ; µ y, p(d) = =1 =1 3/19 =1 y y D x µ y exp y y ) GMM: kow labels Likelihoo of the ata: p(d) = y y D x µ y exp If you kew the labels y the the maximum-likelihoo estimates of the parameters is easy: k = 1 [y = k] µ k = =1 [y = k]x [y = k] k = [y = k] x µ k [y = k] y fractio of examples with label k mea of all the variace of all the 4/19

2 GMM: ukow labels GMM: parameter estimatio ow suppose you i t have labels y. Aalogous to k-meas, oe solutio is to iterate. Start by guessig the parameters a the repeat the two steps: Estimate labels give the parameters Estimate parameters give the labels I k-meas we assige each poit to a sigle cluster, also calle as har assigmet (poit 10 goes to cluster ) I expectatio maximizatio (EM) we will will use soft assigmet (poit 10 goes half to cluster a half to cluster 5) Lets efie a raom variable z = [z1, z,, zk] to eote the assigmet vector for the th poit Har assigmet: oly oe of zk is 1, the rest are 0 Soft assigmet: zk is positive a sum to 1 Formally z,k is the probability that the th poit goes to cluster k 5/19 6/19 z,k = p(y = k x ) = (y = k, x ) (x ) / (y = k) (x y )= k (x ; µ k, Give a set of parameters (θk,μk,σk ), z,k is easy to compute Give z,k, we ca upate the parameters (θk,μk,σk ) as: k = 1 µ k = k = z,k z,kx z,k z,k x µ k z,k k) fractio of examples with label k mea of all the fractioal variace of all the fractioal GMM: example We have replace the iicator variable [y = k] with p(y=k) which is the expectatio of [y=k]. This is our guess of the labels. Just like k-meas the EM is susceptible to local miima. Clusterig example: k-meas GMM 7/19 The EM framework We have ata with observatios x a hie variables y, a woul like to estimate parameters θ The likelihoo of the ata a hie variables: Oly x are kow so we ca compute the ata likelihoo by margializig out the y: p(d) = Y p( ) = Y p(x,y ) p(x,y ) arameter estimatio by maximizig log-likelihoo: ML arg max y log p(x,y ) y har to maximize sice the sum is isie the log 8/19

3 Jese s iequality Give a cocave fuctio f a a set of weights λi 0 a ᵢ λᵢ = 1 Jese s iequality states that f( ᵢ λᵢ xᵢ) ᵢ λᵢ f(xᵢ) This is a irect cosequece of cocavity f(ax + by) a f(x) + b f(y) whe a 0, b 0, a + b = 1 f(y) f(ax+by) a f(x) + b f(y) f(x) 9/19 The EM framework Costruct a lower bou the log-likelihoo usig Jese s iequality L( ) = log p(x,y ) y = f log x q(y ) p(x,y ) Jese s iequality q(y y ) λ p(x,y ) q(y ) log q(y y ) = [q(y ) log p(x,y ) q(y ) log q(y )] y, ˆL( ) Maximize the lower bou: iepeet of θ arg max q(y ) log p(x,y ) y 10/19 Lower bou illustrate Maximizig the lower bou icreases the value of the origial fuctio if the lower bou touches the fuctio at the curret value ˆL( t ) L( ) ˆL( t+1 ) A optimal lower bou Ay choice of the probability istributio q(y) is vali as log as the lower bou touches the fuctio at the curret estimate of θ" We ca the pick the optimal q(y) by maximizig the lower bou arg max [q(y ) log p(x,y ) q(y ) log q(y )] q y This gives us q(y ) p(y x, t ) roof: use Lagragia multipliers with sum to oe costrait L( t )= ˆL( t ) This is the istributios of the hie variables coitioe o the ata a the curret estimate of the parameters This is exactly what we compute i the GMM example t t+1 11/19 1/19

4 The EM algorithm We have ata with observatios x a hie variables y, a woul like to estimate parameters θ of the istributio p(x θ) EM algorithm Iitialize the parameters θ raomly Iterate betwee the followig two steps: E step: Compute probability istributio over the hie variables q(y ) p(y x, ) M step: Maximize the lower bou arg max q(y ) log p(x,y ) y EM algorithm is a great caiate whe M-step ca oe easily but p(x θ) caot be easily optimize over θ For e.g. for GMMs it was easy to compute meas a variaces give the memberships 13/19 aive Bayes: revisite Cosier the biary preictio problem Let the ata be istribute accorig to a probability istributio: aive Bayes assumptio: p (y, x) =p (y, x 1,x,...,x D ) We ca simplify this usig the chai rule of probability: p (y, x) =p (y)p (x 1 y)p (x x 1,y)...p (x D x 1,x,...,x D = p (y) p (x x 1,x,...,x 1,y) p (x x 0,y)=p (x y), 8 0 6= E.g., The wors free a moey are iepeet give spam 1,y) 14/19 aive Bayes: a simple case Case: biary labels a biary features robability of the ata: p (y) =Beroulli( 0 ) p (x y = 1) = Beroulli( + ) p (x y = 1) = Beroulli( ) p (y, x) =p (y) p (x y) }1+D parameters = [y=+1] 0 (1 [y= 0 ) 1]... +[x,y=+1] (1 + )[x =0,y=+1]... [x,y= 1] (1 ) [x =0,y= 1] // label +1 // label -1 15/19 aive Bayes: parameter estimatio Give ata we ca estimate the parameters by maximizig ata likelihoo The maximum likelihoo estimates are: ˆ 0 = [y = +1] ˆ + = [x, =1,y = +1] [y = +1] ˆ = [x, =1,y = 1] [y = 1] // fractio of the ata with label as +1 // fractio of the istaces with 1 amog +1 // fractio of the istaces with 1 amog -1 16/19

5 aive Bayes: EM ow suppose you o t have labels y Iitialize the parameters θ raomly E step: compute the istributio over the hie variables q(y) q(y = 1) = p(y =+1 x, ) / 0 + +[x,=1] (1 + )[x,=0] M step: estimate θ give the guesses 0 = q(y = 1) + = [x, = 1]q(y = 1) q(y = 1) = [x, = 1]q(y = 1) q(y = 1) // fractio of the ata with label as +1 // fractio of the istaces with 1 amog +1 // fractio of the istaces with 1 amog -1 Summary Expectatio maximizatio A geeral techique to estimate parameters of probabilistic moels whe some observatios are hie EM iterates betwee estimatig the hie variables a optimizig parameters give the hie variables EM ca be see as a maximizatio of the lower bou of the ata log-likelihoo we use Jese s iequality to switch the log-sum to sum-log EM ca be use for learig: mixtures of istributios for clusterig, e.g. GMM parameters for hie Markov moels (ext lecture) topic moels i L probabilistic CA. 17/19 18/19 Slies creit Some of the slies are base o CIML book by Hal Daume III The figure for the EM lower bou is base o cxwagyi.worpress.com/008/11/ Clusterig k-meas vs GMM is from github/icta/mlss/tree/master/clusterig/ 19/19

Expectation maximization

Expectation maximization Expectation maximization Subhransu Maji CMSCI 689: Machine Learning 14 April 2015 Motivation Suppose you are building a naive Bayes spam classifier. After your are done your boss tells you that there is

More information

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar.

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar. Clusterig CM226: Machie Learig for Bioiformatics. Fall 216 Sriram Sakararama Ackowledgmets: Fei Sha, Ameet Talwalkar Clusterig 1 / 42 Admiistratio HW 1 due o Moday. Email/post o CCLE if you have questios.

More information

Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019

Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019 Outlie CSCI-567: Machie Learig Sprig 209 Gaussia mixture models Prof. Victor Adamchik 2 Desity estimatio U of Souther Califoria Mar. 26, 209 3 Naive Bayes Revisited March 26, 209 / 57 March 26, 209 2 /

More information

1 Review and Overview

1 Review and Overview CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #12 Scribe: Garrett Thomas, Pega Liu October 31, 2018 1 Review a Overview Recall the GAN setup: we have iepeet samples x 1,..., x raw

More information

Mixtures of Gaussians and the EM Algorithm

Mixtures of Gaussians and the EM Algorithm Mixtures of Gaussias ad the EM Algorithm CSE 6363 Machie Learig Vassilis Athitsos Computer Sciece ad Egieerig Departmet Uiversity of Texas at Arligto 1 Gaussias A popular way to estimate probability desity

More information

The Expectation-Maximization (EM) Algorithm

The Expectation-Maximization (EM) Algorithm The Expectatio-Maximizatio (EM) Algorithm Readig Assigmets T. Mitchell, Machie Learig, McGraw-Hill, 997 (sectio 6.2, hard copy). S. Gog et al. Dyamic Visio: From Images to Face Recogitio, Imperial College

More information

Expectation-Maximization Algorithm.

Expectation-Maximization Algorithm. Expectatio-Maximizatio Algorithm. Petr Pošík Czech Techical Uiversity i Prague Faculty of Electrical Egieerig Dept. of Cyberetics MLE 2 Likelihood.........................................................................................................

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT

More information

Grouping 2: Spectral and Agglomerative Clustering. CS 510 Lecture #16 April 2 nd, 2014

Grouping 2: Spectral and Agglomerative Clustering. CS 510 Lecture #16 April 2 nd, 2014 Groupig 2: Spectral ad Agglomerative Clusterig CS 510 Lecture #16 April 2 d, 2014 Groupig (review) Goal: Detect local image features (SIFT) Describe image patches aroud features SIFT, SURF, HoG, LBP, Group

More information

Algorithms for Clustering

Algorithms for Clustering CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat

More information

6.3.3 Parameter Estimation

6.3.3 Parameter Estimation 130 CHAPTER 6. ARMA MODELS 6.3.3 Parameter Estimatio I this sectio we will iscuss methos of parameter estimatio for ARMAp,q assumig that the orers p a q are kow. Metho of Momets I this metho we equate

More information

6.867 Machine learning, lecture 11 (Jaakkola)

6.867 Machine learning, lecture 11 (Jaakkola) 6.867 Machie learig, lecture 11 (Jaakkola) 1 Lecture topics: moel selectio criteria Miimum escriptio legth (MDL) Feature (subset) selectio Moel selectio criteria: Miimum escriptio legth (MDL) The miimum

More information

Chapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian

Chapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian Chapter 2 EM algorithms The Expectatio-Maximizatio (EM) algorithm is a maximum likelihood method for models that have hidde variables eg. Gaussia Mixture Models (GMMs), Liear Dyamic Systems (LDSs) ad Hidde

More information

Lecture 12: September 27

Lecture 12: September 27 36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.

More information

Exponential Families and Bayesian Inference

Exponential Families and Bayesian Inference Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where

More information

Lecture 11 and 12: Basic estimation theory

Lecture 11 and 12: Basic estimation theory Lecture ad 2: Basic estimatio theory Sprig 202 - EE 94 Networked estimatio ad cotrol Prof. Kha March 2 202 I. MAXIMUM-LIKELIHOOD ESTIMATORS The maximum likelihood priciple is deceptively simple. Louis

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Patter Recogitio Classificatio: No-Parametric Modelig Hamid R. Rabiee Jafar Muhammadi Sprig 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Ageda Parametric Modelig No-Parametric Modelig

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

Regression and generalization

Regression and generalization Regressio ad geeralizatio CE-717: Machie Learig Sharif Uiversity of Techology M. Soleymai Fall 2016 Curve fittig: probabilistic perspective Describig ucertaity over value of target variable as a probability

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5 CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio

More information

Lecture 6 Testing Nonlinear Restrictions 1. The previous lectures prepare us for the tests of nonlinear restrictions of the form:

Lecture 6 Testing Nonlinear Restrictions 1. The previous lectures prepare us for the tests of nonlinear restrictions of the form: Eco 75 Lecture 6 Testig Noliear Restrictios The previous lectures prepare us for the tests of oliear restrictios of the form: H 0 : h( 0 ) = 0 versus H : h( 0 ) 6= 0: () I this lecture, we cosier Wal,

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

Agnostic Learning and Concentration Inequalities

Agnostic Learning and Concentration Inequalities ECE901 Sprig 2004 Statistical Regularizatio ad Learig Theory Lecture: 7 Agostic Learig ad Cocetratio Iequalities Lecturer: Rob Nowak Scribe: Aravid Kailas 1 Itroductio 1.1 Motivatio I the last lecture

More information

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where

More information

Support vector machine revisited

Support vector machine revisited 6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

Axis Aligned Ellipsoid

Axis Aligned Ellipsoid Machie Learig for Data Sciece CS 4786) Lecture 6,7 & 8: Ellipsoidal Clusterig, Gaussia Mixture Models ad Geeral Mixture Models The text i black outlies high level ideas. The text i blue provides simple

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

Lecture 2 February 8, 2016

Lecture 2 February 8, 2016 MIT 6.854/8.45: Advaced Algorithms Sprig 206 Prof. Akur Moitra Lecture 2 February 8, 206 Scribe: Calvi Huag, Lih V. Nguye I this lecture, we aalyze the problem of schedulig equal size tasks arrivig olie

More information

Probability and MLE.

Probability and MLE. 10-701 Probability ad MLE http://www.cs.cmu.edu/~pradeepr/701 (brief) itro to probability Basic otatios Radom variable - referrig to a elemet / evet whose status is ukow: A = it will rai tomorrow Domai

More information

Maximum Likelihood Estimation and Complexity Regularization

Maximum Likelihood Estimation and Complexity Regularization ECE90 Sprig 004 Statistical Regularizatio ad Learig Theory Lecture: 4 Maximum Likelihood Estimatio ad Complexity Regularizatio Lecturer: Rob Nowak Scribe: Pam Limpiti Review : Maximum Likelihood Estimatio

More information

Clustering: Mixture Models

Clustering: Mixture Models Clusterig: Mixture Models Machie Learig 10-601B Seyoug Kim May of these slides are derived from Tom Mitchell, Ziv- Bar Joseph, ad Eric Xig. Thaks! Problem with K- meas Hard Assigmet of Samples ito Three

More information

Lecture 13: Maximum Likelihood Estimation

Lecture 13: Maximum Likelihood Estimation ECE90 Sprig 007 Statistical Learig Theory Istructor: R. Nowak Lecture 3: Maximum Likelihood Estimatio Summary of Lecture I the last lecture we derived a risk (MSE) boud for regressio problems; i.e., select

More information

Probabilistic Unsupervised Learning

Probabilistic Unsupervised Learning HT2015: SC4 Statistical Data Miig ad Machie Learig Dio Sejdiovic Departmet of Statistics Oxford http://www.stats.ox.ac.u/~sejdiov/sdmml.html Probabilistic Methods Algorithmic approach: Data Probabilistic

More information

Distributional Similarity Models (cont.)

Distributional Similarity Models (cont.) Distributioal Similarity Models (cot.) Regia Barzilay EECS Departmet MIT October 19, 2004 Sematic Similarity Vector Space Model Similarity Measures cosie Euclidea distace... Clusterig k-meas hierarchical

More information

Definition 2 (Eigenvalue Expansion). We say a d-regular graph is a λ eigenvalue expander if

Definition 2 (Eigenvalue Expansion). We say a d-regular graph is a λ eigenvalue expander if Expaer Graphs Graph Theory (Fall 011) Rutgers Uiversity Swastik Kopparty Throughout these otes G is a -regular graph 1 The Spectrum Let A G be the ajacecy matrix of G Let λ 1 λ λ be the eigevalues of A

More information

The Chi Squared Distribution Page 1

The Chi Squared Distribution Page 1 The Chi Square Distributio Page Cosier the istributio of the square of a score take from N(, The probability that z woul have a value less tha is give by z / g ( ( e z if > F π, if < z where ( e g e z

More information

Distributional Similarity Models (cont.)

Distributional Similarity Models (cont.) Sematic Similarity Vector Space Model Similarity Measures cosie Euclidea distace... Clusterig k-meas hierarchical Last Time EM Clusterig Soft versio of K-meas clusterig Iput: m dimesioal objects X = {

More information

(average number of points per unit length). Note that Equation (9B1) does not depend on the

(average number of points per unit length). Note that Equation (9B1) does not depend on the EE603 Class Notes 9/25/203 Joh Stesby Appeix 9-B: Raom Poisso Poits As iscusse i Chapter, let (t,t 2 ) eote the umber of Poisso raom poits i the iterval (t, t 2 ]. The quatity (t, t 2 ) is a o-egative-iteger-value

More information

Problem Set 4 Due Oct, 12

Problem Set 4 Due Oct, 12 EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

(7 One- and Two-Sample Estimation Problem )

(7 One- and Two-Sample Estimation Problem ) 34 Stat Lecture Notes (7 Oe- ad Two-Sample Estimatio Problem ) ( Book*: Chapter 8,pg65) Probability& Statistics for Egieers & Scietists By Walpole, Myers, Myers, Ye Estimatio 1 ) ( ˆ S P i i Poit estimate:

More information

Lecture 23 Rearrangement Inequality

Lecture 23 Rearrangement Inequality Lecture 23 Rearragemet Iequality Holde Lee 6/4/ The Iequalities We start with a example Suppose there are four boxes cotaiig $0, $20, $50 ad $00 bills, respectively You may take 2 bills from oe box, 3

More information

ECE 901 Lecture 13: Maximum Likelihood Estimation

ECE 901 Lecture 13: Maximum Likelihood Estimation ECE 90 Lecture 3: Maximum Likelihood Estimatio R. Nowak 5/7/009 The focus of this lecture is to cosider aother approach to learig based o maximum likelihood estimatio. Ulike earlier approaches cosidered

More information

16 EXPECTATION MAXIMIZATION

16 EXPECTATION MAXIMIZATION 16 EXPECTATION MAXIMIZATION A he is oly a egg s way of akig aother egg. Sauel Butler Suppose you were buildig a aive Bayes odel for a text categorizatio proble. After you were doe, your boss told you that

More information

Recurrence Relations

Recurrence Relations Recurrece Relatios Aalysis of recursive algorithms, such as: it factorial (it ) { if (==0) retur ; else retur ( * factorial(-)); } Let t be the umber of multiplicatios eeded to calculate factorial(). The

More information

Confidence Level We want to estimate the true mean of a random variable X economically and with confidence.

Confidence Level We want to estimate the true mean of a random variable X economically and with confidence. Cofidece Iterval 700 Samples Sample Mea 03 Cofidece Level 095 Margi of Error 0037 We wat to estimate the true mea of a radom variable X ecoomically ad with cofidece True Mea μ from the Etire Populatio

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple

More information

1 Review and Overview

1 Review and Overview CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we

More information

CSE 527, Additional notes on MLE & EM

CSE 527, Additional notes on MLE & EM CSE 57 Lecture Notes: MLE & EM CSE 57, Additioal otes o MLE & EM Based o earlier otes by C. Grat & M. Narasimha Itroductio Last lecture we bega a examiatio of model based clusterig. This lecture will be

More information

Intro to Learning Theory

Intro to Learning Theory Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified

More information

Algorithms in The Real World Fall 2002 Homework Assignment 2 Solutions

Algorithms in The Real World Fall 2002 Homework Assignment 2 Solutions Algorithms i The Real Worl Fall 00 Homewor Assigmet Solutios Problem. Suppose that a bipartite graph with oes o the left a oes o the right is costructe by coectig each oe o the left to raomly-selecte oes

More information

The Bayesian Learning Framework. Back to Maximum Likelihood. Naïve Bayes. Simple Example: Coin Tosses. Given a generative model

The Bayesian Learning Framework. Back to Maximum Likelihood. Naïve Bayes. Simple Example: Coin Tosses. Given a generative model Back to Maximum Likelihood Give a geerative model f (x, y = k) =π k f k (x) Usig a geerative modellig approach, we assume a parametric form for f k (x) =f (x; k ) ad compute the MLE θ of θ =(π k, k ) k=

More information

Summary. Recap. Last Lecture. Let W n = W n (X 1,, X n ) = W n (X) be a sequence of estimators for

Summary. Recap. Last Lecture. Let W n = W n (X 1,, X n ) = W n (X) be a sequence of estimators for Last Lecture Biostatistics 602 - Statistical Iferece Lecture 17 Asymptotic Evaluatio of oit Estimators Hyu Mi Kag March 19th, 2013 What is a Bayes Risk? What is the Bayes rule Estimator miimizig square

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

Probability in Medical Imaging

Probability in Medical Imaging Chapter P Probability i Meical Imagig Cotets Itrouctio P1 Probability a isotropic emissios P2 Raioactive ecay statistics P4 Biomial coutig process P4 Half-life P5 Poisso process P6 Determiig activity of

More information

f(x i ; ) L(x; p) = i=1 To estimate the value of that maximizes L or equivalently ln L we will set =0, for i =1, 2,...,m p x i (1 p) 1 x i i=1

f(x i ; ) L(x; p) = i=1 To estimate the value of that maximizes L or equivalently ln L we will set =0, for i =1, 2,...,m p x i (1 p) 1 x i i=1 Parameter Estimatio Samples from a probability distributio F () are: [,,..., ] T.Theprobabilitydistributio has a parameter vector [,,..., m ] T. Estimator: Statistic used to estimate ukow. Estimate: Observed

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

Stat 421-SP2012 Interval Estimation Section

Stat 421-SP2012 Interval Estimation Section Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples

More information

5.1 Review of Singular Value Decomposition (SVD)

5.1 Review of Singular Value Decomposition (SVD) MGMT 69000: Topics i High-dimesioal Data Aalysis Falll 06 Lecture 5: Spectral Clusterig: Overview (cotd) ad Aalysis Lecturer: Jiamig Xu Scribe: Adarsh Barik, Taotao He, September 3, 06 Outlie Review of

More information

How to Maximize a Function without Really Trying

How to Maximize a Function without Really Trying How to Maximize a Fuctio without Really Tryig MARK FLANAGAN School of Electrical, Electroic ad Commuicatios Egieerig Uiversity College Dubli We will prove a famous elemetary iequality called The Rearragemet

More information

5. Fractional Hot deck Imputation

5. Fractional Hot deck Imputation 5. Fractioal Hot deck Imputatio Itroductio Suppose that we are iterested i estimatig θ EY or eve θ 2 P ry < c where y fy x where x is always observed ad y is subject to missigess. Assume MAR i the sese

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm

More information

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples?

More information

Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32

Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32 Boostig Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, 2017 1 / 32 Outlie 1 Admiistratio 2 Review of last lecture 3 Boostig Professor Ameet Talwalkar CS260

More information

Unsupervised Learning 2001

Unsupervised Learning 2001 Usupervised Learig 2001 Lecture 3: The EM Algorithm Zoubi Ghahramai zoubi@gatsby.ucl.ac.uk Carl Edward Rasmusse edward@gatsby.ucl.ac.uk Gatsby Computatioal Neurosciece Uit MSc Itelliget Systems, Computer

More information

Introduction to Artificial Intelligence CAP 4601 Summer 2013 Midterm Exam

Introduction to Artificial Intelligence CAP 4601 Summer 2013 Midterm Exam Itroductio to Artificial Itelligece CAP 601 Summer 013 Midterm Exam 1. Termiology (7 Poits). Give the followig task eviromets, eter their properties/characteristics. The properties/characteristics of the

More information

3. Calculus with distributions

3. Calculus with distributions 6 RODICA D. COSTIN 3.1. Limits of istributios. 3. Calculus with istributios Defiitio 4. A sequece of istributios {u } coverges to the istributio u (all efie o the same space of test fuctios) if (φ, u )

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Intelligent Systems I 08 SVM

Intelligent Systems I 08 SVM Itelliget Systems I 08 SVM Stefa Harmelig & Philipp Heig 12. December 2013 Max Plack Istitute for Itelliget Systems Dptmt. of Empirical Iferece 1 / 30 Your feeback Ejoye most Laplace approximatio gettig

More information

Kurskod: TAMS11 Provkod: TENB 21 March 2015, 14:00-18:00. English Version (no Swedish Version)

Kurskod: TAMS11 Provkod: TENB 21 March 2015, 14:00-18:00. English Version (no Swedish Version) Kurskod: TAMS Provkod: TENB 2 March 205, 4:00-8:00 Examier: Xiagfeg Yag (Tel: 070 2234765). Please aswer i ENGLISH if you ca. a. You are allowed to use: a calculator; formel -och tabellsamlig i matematisk

More information

Inhomogeneous Poisson process

Inhomogeneous Poisson process Chapter 22 Ihomogeeous Poisso process We coclue our stuy of Poisso processes with the case of o-statioary rates. Let us cosier a arrival rate, λ(t), that with time, but oe that is still Markovia. That

More information

NUMERICAL METHODS FOR SOLVING EQUATIONS

NUMERICAL METHODS FOR SOLVING EQUATIONS Mathematics Revisio Guides Numerical Methods for Solvig Equatios Page 1 of 11 M.K. HOME TUITION Mathematics Revisio Guides Level: GCSE Higher Tier NUMERICAL METHODS FOR SOLVING EQUATIONS Versio:. Date:

More information

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques

More information

Linear Support Vector Machines

Linear Support Vector Machines Liear Support Vector Machies David S. Roseberg The Support Vector Machie For a liear support vector machie (SVM), we use the hypothesis space of affie fuctios F = { f(x) = w T x + b w R d, b R } ad evaluate

More information

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics.

More information

Learning Theory: Lecture Notes

Learning Theory: Lecture Notes Learig Theory: Lecture Notes Kamalika Chaudhuri October 4, 0 Cocetratio of Averages Cocetratio of measure is very useful i showig bouds o the errors of machie-learig algorithms. We will begi with a basic

More information

Simulation. Two Rule For Inverting A Distribution Function

Simulation. Two Rule For Inverting A Distribution Function Simulatio Two Rule For Ivertig A Distributio Fuctio Rule 1. If F(x) = u is costat o a iterval [x 1, x 2 ), the the uiform value u is mapped oto x 2 through the iversio process. Rule 2. If there is a jump

More information

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ. 2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machie Learig (Fall 2014) Drs. Sha & Liu {feisha,yaliu.cs}@usc.edu October 9, 2014 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, 2014 1 / 49 Outlie Admiistratio

More information

Machine Learning 4771

Machine Learning 4771 Machie Learig 4771 Istructor: Toy Jebara Topic 14 Structurig Probability Fuctios for Storage Structurig Probability Fuctios for Iferece Basic Graphical Models Graphical Models Parameters as Nodes Structurig

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

Sparsification using Regular and Weighted. Graphs

Sparsification using Regular and Weighted. Graphs Sparsificatio usig Regular a Weighte 1 Graphs Aly El Gamal ECE Departmet a Cooriate Sciece Laboratory Uiversity of Illiois at Urbaa-Champaig Abstract We review the state of the art results o spectral approximatio

More information

Analytic Number Theory Solutions

Analytic Number Theory Solutions Aalytic Number Theory Solutios Sea Li Corell Uiversity sl6@corell.eu Ja. 03 Itrouctio This ocumet is a work-i-progress solutio maual for Tom Apostol s Itrouctio to Aalytic Number Theory. The solutios were

More information

Lecture #3. Math tools covered today

Lecture #3. Math tools covered today Toay s Program:. Review of previous lecture. QM free particle a particle i a bo. 3. Priciple of spectral ecompositio. 4. Fourth Postulate Math tools covere toay Lecture #3. Lear how to solve separable

More information

Pattern Classification, Ch4 (Part 1)

Pattern Classification, Ch4 (Part 1) Patter Classificatio All materials i these slides were take from Patter Classificatio (2d ed) by R O Duda, P E Hart ad D G Stork, Joh Wiley & Sos, 2000 with the permissio of the authors ad the publisher

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete

More information

Machine Learning. Logistic Regression -- generative verses discriminative classifier. Le Song /15-781, Spring 2008

Machine Learning. Logistic Regression -- generative verses discriminative classifier. Le Song /15-781, Spring 2008 Machie Learig 070/578 Srig 008 Logistic Regressio geerative verses discrimiative classifier Le Sog Lecture 5 Setember 4 0 Based o slides from Eric Xig CMU Readig: Cha. 3..34 CB Geerative vs. Discrimiative

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

Understanding Samples

Understanding Samples 1 Will Moroe CS 109 Samplig ad Bootstrappig Lecture Notes #17 August 2, 2017 Based o a hadout by Chris Piech I this chapter we are goig to talk about statistics calculated o samples from a populatio. We

More information

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f. Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,

More information

Part I: Covers Sequence through Series Comparison Tests

Part I: Covers Sequence through Series Comparison Tests Part I: Covers Sequece through Series Compariso Tests. Give a example of each of the followig: (a) A geometric sequece: (b) A alteratig sequece: (c) A sequece that is bouded, but ot coverget: (d) A sequece

More information

Solution of Final Exam : / Machine Learning

Solution of Final Exam : / Machine Learning Solutio of Fial Exam : 10-701/15-781 Machie Learig Fall 2004 Dec. 12th 2004 Your Adrew ID i capital letters: Your full ame: There are 9 questios. Some of them are easy ad some are more difficult. So, if

More information

CHAPTER 10 INFINITE SEQUENCES AND SERIES

CHAPTER 10 INFINITE SEQUENCES AND SERIES CHAPTER 10 INFINITE SEQUENCES AND SERIES 10.1 Sequeces 10.2 Ifiite Series 10.3 The Itegral Tests 10.4 Compariso Tests 10.5 The Ratio ad Root Tests 10.6 Alteratig Series: Absolute ad Coditioal Covergece

More information

Seunghee Ye Ma 8: Week 5 Oct 28

Seunghee Ye Ma 8: Week 5 Oct 28 Week 5 Summary I Sectio, we go over the Mea Value Theorem ad its applicatios. I Sectio 2, we will recap what we have covered so far this term. Topics Page Mea Value Theorem. Applicatios of the Mea Value

More information

15-780: Graduate Artificial Intelligence. Density estimation

15-780: Graduate Artificial Intelligence. Density estimation 5-780: Graduate Artificial Itelligece Desity estimatio Coditioal Probability Tables (CPT) But where do we get them? P(B)=.05 B P(E)=. E P(A B,E) )=.95 P(A B, E) =.85 P(A B,E) )=.5 P(A B, E) =.05 A P(J

More information

Probabilistic Unsupervised Learning

Probabilistic Unsupervised Learning Statistical Data Miig ad Machie Learig Hilary Term 2016 Dio Sejdiovic Departmet of Statistics Oxford Slides ad other materials available at: http://www.stats.ox.ac.u/~sejdiov/sdmml Probabilistic Methods

More information

Classification with linear models

Classification with linear models Lecture 8 Classificatio with liear models Milos Hauskrecht milos@cs.pitt.edu 539 Seott Square Geerative approach to classificatio Idea:. Represet ad lear the distributio, ). Use it to defie probabilistic

More information