Expectation maximization
|
|
- Aldous Riley
- 5 years ago
- Views:
Transcription
1 Motivatio Expectatio maximizatio Subhrasu Maji CMSCI 689: Machie Learig 14 April 015 Suppose you are builig a aive Bayes spam classifier. After your are oe your boss tells you that there is o moey to label the ata. You have a probabilistic moel that assumes labelle ata, but you o't have ay labels. Ca you still o somethig? Amazigly you ca Treat the labels as hie variables a try to lear them simultaeously alog with the parameters of the moel Expectatio Maximizatio (EM) A broa family of algorithms for solvig hie variable problems I toay s lecture we will erive EM algorithms for clusterig a aive Bayes classificatio a lear why EM works /19 Gaussia mixture moel for clusterig Suppose ata comes from a Gaussia Mixture Moel (GMM) you have K clusters a the ata from the cluster k is raw from a Gaussia with mea μk a variace σk We will assume that the ata comes with labels (we will soo remove this assumptio) Geerative story of the ata: For each example = 1,,.., Choose a label y Mult( 1,,..., K ) Choose example x (µ k, k) Likelihoo of the ata: p(d) = p(y )p(x y ) = y (x ; µ y, p(d) = =1 =1 3/19 =1 y y D x µ y exp y y ) GMM: kow labels Likelihoo of the ata: p(d) = y y D x µ y exp If you kew the labels y the the maximum-likelihoo estimates of the parameters is easy: k = 1 [y = k] µ k = =1 [y = k]x [y = k] k = [y = k] x µ k [y = k] y fractio of examples with label k mea of all the variace of all the 4/19
2 GMM: ukow labels GMM: parameter estimatio ow suppose you i t have labels y. Aalogous to k-meas, oe solutio is to iterate. Start by guessig the parameters a the repeat the two steps: Estimate labels give the parameters Estimate parameters give the labels I k-meas we assige each poit to a sigle cluster, also calle as har assigmet (poit 10 goes to cluster ) I expectatio maximizatio (EM) we will will use soft assigmet (poit 10 goes half to cluster a half to cluster 5) Lets efie a raom variable z = [z1, z,, zk] to eote the assigmet vector for the th poit Har assigmet: oly oe of zk is 1, the rest are 0 Soft assigmet: zk is positive a sum to 1 Formally z,k is the probability that the th poit goes to cluster k 5/19 6/19 z,k = p(y = k x ) = (y = k, x ) (x ) / (y = k) (x y )= k (x ; µ k, Give a set of parameters (θk,μk,σk ), z,k is easy to compute Give z,k, we ca upate the parameters (θk,μk,σk ) as: k = 1 µ k = k = z,k z,kx z,k z,k x µ k z,k k) fractio of examples with label k mea of all the fractioal variace of all the fractioal GMM: example We have replace the iicator variable [y = k] with p(y=k) which is the expectatio of [y=k]. This is our guess of the labels. Just like k-meas the EM is susceptible to local miima. Clusterig example: k-meas GMM 7/19 The EM framework We have ata with observatios x a hie variables y, a woul like to estimate parameters θ The likelihoo of the ata a hie variables: Oly x are kow so we ca compute the ata likelihoo by margializig out the y: p(d) = Y p( ) = Y p(x,y ) p(x,y ) arameter estimatio by maximizig log-likelihoo: ML arg max y log p(x,y ) y har to maximize sice the sum is isie the log 8/19
3 Jese s iequality Give a cocave fuctio f a a set of weights λi 0 a ᵢ λᵢ = 1 Jese s iequality states that f( ᵢ λᵢ xᵢ) ᵢ λᵢ f(xᵢ) This is a irect cosequece of cocavity f(ax + by) a f(x) + b f(y) whe a 0, b 0, a + b = 1 f(y) f(ax+by) a f(x) + b f(y) f(x) 9/19 The EM framework Costruct a lower bou the log-likelihoo usig Jese s iequality L( ) = log p(x,y ) y = f log x q(y ) p(x,y ) Jese s iequality q(y y ) λ p(x,y ) q(y ) log q(y y ) = [q(y ) log p(x,y ) q(y ) log q(y )] y, ˆL( ) Maximize the lower bou: iepeet of θ arg max q(y ) log p(x,y ) y 10/19 Lower bou illustrate Maximizig the lower bou icreases the value of the origial fuctio if the lower bou touches the fuctio at the curret value ˆL( t ) L( ) ˆL( t+1 ) A optimal lower bou Ay choice of the probability istributio q(y) is vali as log as the lower bou touches the fuctio at the curret estimate of θ" We ca the pick the optimal q(y) by maximizig the lower bou arg max [q(y ) log p(x,y ) q(y ) log q(y )] q y This gives us q(y ) p(y x, t ) roof: use Lagragia multipliers with sum to oe costrait L( t )= ˆL( t ) This is the istributios of the hie variables coitioe o the ata a the curret estimate of the parameters This is exactly what we compute i the GMM example t t+1 11/19 1/19
4 The EM algorithm We have ata with observatios x a hie variables y, a woul like to estimate parameters θ of the istributio p(x θ) EM algorithm Iitialize the parameters θ raomly Iterate betwee the followig two steps: E step: Compute probability istributio over the hie variables q(y ) p(y x, ) M step: Maximize the lower bou arg max q(y ) log p(x,y ) y EM algorithm is a great caiate whe M-step ca oe easily but p(x θ) caot be easily optimize over θ For e.g. for GMMs it was easy to compute meas a variaces give the memberships 13/19 aive Bayes: revisite Cosier the biary preictio problem Let the ata be istribute accorig to a probability istributio: aive Bayes assumptio: p (y, x) =p (y, x 1,x,...,x D ) We ca simplify this usig the chai rule of probability: p (y, x) =p (y)p (x 1 y)p (x x 1,y)...p (x D x 1,x,...,x D = p (y) p (x x 1,x,...,x 1,y) p (x x 0,y)=p (x y), 8 0 6= E.g., The wors free a moey are iepeet give spam 1,y) 14/19 aive Bayes: a simple case Case: biary labels a biary features robability of the ata: p (y) =Beroulli( 0 ) p (x y = 1) = Beroulli( + ) p (x y = 1) = Beroulli( ) p (y, x) =p (y) p (x y) }1+D parameters = [y=+1] 0 (1 [y= 0 ) 1]... +[x,y=+1] (1 + )[x =0,y=+1]... [x,y= 1] (1 ) [x =0,y= 1] // label +1 // label -1 15/19 aive Bayes: parameter estimatio Give ata we ca estimate the parameters by maximizig ata likelihoo The maximum likelihoo estimates are: ˆ 0 = [y = +1] ˆ + = [x, =1,y = +1] [y = +1] ˆ = [x, =1,y = 1] [y = 1] // fractio of the ata with label as +1 // fractio of the istaces with 1 amog +1 // fractio of the istaces with 1 amog -1 16/19
5 aive Bayes: EM ow suppose you o t have labels y Iitialize the parameters θ raomly E step: compute the istributio over the hie variables q(y) q(y = 1) = p(y =+1 x, ) / 0 + +[x,=1] (1 + )[x,=0] M step: estimate θ give the guesses 0 = q(y = 1) + = [x, = 1]q(y = 1) q(y = 1) = [x, = 1]q(y = 1) q(y = 1) // fractio of the ata with label as +1 // fractio of the istaces with 1 amog +1 // fractio of the istaces with 1 amog -1 Summary Expectatio maximizatio A geeral techique to estimate parameters of probabilistic moels whe some observatios are hie EM iterates betwee estimatig the hie variables a optimizig parameters give the hie variables EM ca be see as a maximizatio of the lower bou of the ata log-likelihoo we use Jese s iequality to switch the log-sum to sum-log EM ca be use for learig: mixtures of istributios for clusterig, e.g. GMM parameters for hie Markov moels (ext lecture) topic moels i L probabilistic CA. 17/19 18/19 Slies creit Some of the slies are base o CIML book by Hal Daume III The figure for the EM lower bou is base o cxwagyi.worpress.com/008/11/ Clusterig k-meas vs GMM is from github/icta/mlss/tree/master/clusterig/ 19/19
Expectation maximization
Expectation maximization Subhransu Maji CMSCI 689: Machine Learning 14 April 2015 Motivation Suppose you are building a naive Bayes spam classifier. After your are done your boss tells you that there is
More informationClustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar.
Clusterig CM226: Machie Learig for Bioiformatics. Fall 216 Sriram Sakararama Ackowledgmets: Fei Sha, Ameet Talwalkar Clusterig 1 / 42 Admiistratio HW 1 due o Moday. Email/post o CCLE if you have questios.
More informationOutline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019
Outlie CSCI-567: Machie Learig Sprig 209 Gaussia mixture models Prof. Victor Adamchik 2 Desity estimatio U of Souther Califoria Mar. 26, 209 3 Naive Bayes Revisited March 26, 209 / 57 March 26, 209 2 /
More information1 Review and Overview
CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #12 Scribe: Garrett Thomas, Pega Liu October 31, 2018 1 Review a Overview Recall the GAN setup: we have iepeet samples x 1,..., x raw
More informationMixtures of Gaussians and the EM Algorithm
Mixtures of Gaussias ad the EM Algorithm CSE 6363 Machie Learig Vassilis Athitsos Computer Sciece ad Egieerig Departmet Uiversity of Texas at Arligto 1 Gaussias A popular way to estimate probability desity
More informationThe Expectation-Maximization (EM) Algorithm
The Expectatio-Maximizatio (EM) Algorithm Readig Assigmets T. Mitchell, Machie Learig, McGraw-Hill, 997 (sectio 6.2, hard copy). S. Gog et al. Dyamic Visio: From Images to Face Recogitio, Imperial College
More informationExpectation-Maximization Algorithm.
Expectatio-Maximizatio Algorithm. Petr Pošík Czech Techical Uiversity i Prague Faculty of Electrical Egieerig Dept. of Cyberetics MLE 2 Likelihood.........................................................................................................
More informationMachine Learning Theory (CS 6783)
Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT
More informationGrouping 2: Spectral and Agglomerative Clustering. CS 510 Lecture #16 April 2 nd, 2014
Groupig 2: Spectral ad Agglomerative Clusterig CS 510 Lecture #16 April 2 d, 2014 Groupig (review) Goal: Detect local image features (SIFT) Describe image patches aroud features SIFT, SURF, HoG, LBP, Group
More informationAlgorithms for Clustering
CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat
More information6.3.3 Parameter Estimation
130 CHAPTER 6. ARMA MODELS 6.3.3 Parameter Estimatio I this sectio we will iscuss methos of parameter estimatio for ARMAp,q assumig that the orers p a q are kow. Metho of Momets I this metho we equate
More information6.867 Machine learning, lecture 11 (Jaakkola)
6.867 Machie learig, lecture 11 (Jaakkola) 1 Lecture topics: moel selectio criteria Miimum escriptio legth (MDL) Feature (subset) selectio Moel selectio criteria: Miimum escriptio legth (MDL) The miimum
More informationChapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian
Chapter 2 EM algorithms The Expectatio-Maximizatio (EM) algorithm is a maximum likelihood method for models that have hidde variables eg. Gaussia Mixture Models (GMMs), Liear Dyamic Systems (LDSs) ad Hidde
More informationLecture 12: September 27
36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.
More informationExponential Families and Bayesian Inference
Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where
More informationLecture 11 and 12: Basic estimation theory
Lecture ad 2: Basic estimatio theory Sprig 202 - EE 94 Networked estimatio ad cotrol Prof. Kha March 2 202 I. MAXIMUM-LIKELIHOOD ESTIMATORS The maximum likelihood priciple is deceptively simple. Louis
More informationStatistical Pattern Recognition
Statistical Patter Recogitio Classificatio: No-Parametric Modelig Hamid R. Rabiee Jafar Muhammadi Sprig 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Ageda Parametric Modelig No-Parametric Modelig
More information10-701/ Machine Learning Mid-term Exam Solution
0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it
More informationRegression and generalization
Regressio ad geeralizatio CE-717: Machie Learig Sharif Uiversity of Techology M. Soleymai Fall 2016 Curve fittig: probabilistic perspective Describig ucertaity over value of target variable as a probability
More informationCS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5
CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio
More informationLecture 6 Testing Nonlinear Restrictions 1. The previous lectures prepare us for the tests of nonlinear restrictions of the form:
Eco 75 Lecture 6 Testig Noliear Restrictios The previous lectures prepare us for the tests of oliear restrictios of the form: H 0 : h( 0 ) = 0 versus H : h( 0 ) 6= 0: () I this lecture, we cosier Wal,
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 12
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig
More informationAgnostic Learning and Concentration Inequalities
ECE901 Sprig 2004 Statistical Regularizatio ad Learig Theory Lecture: 7 Agostic Learig ad Cocetratio Iequalities Lecturer: Rob Nowak Scribe: Aravid Kailas 1 Itroductio 1.1 Motivatio I the last lecture
More informationECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization
ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where
More informationSupport vector machine revisited
6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector
More informationEECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1
EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum
More informationAxis Aligned Ellipsoid
Machie Learig for Data Sciece CS 4786) Lecture 6,7 & 8: Ellipsoidal Clusterig, Gaussia Mixture Models ad Geeral Mixture Models The text i black outlies high level ideas. The text i blue provides simple
More informationECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015
ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],
More informationLecture 2 February 8, 2016
MIT 6.854/8.45: Advaced Algorithms Sprig 206 Prof. Akur Moitra Lecture 2 February 8, 206 Scribe: Calvi Huag, Lih V. Nguye I this lecture, we aalyze the problem of schedulig equal size tasks arrivig olie
More informationProbability and MLE.
10-701 Probability ad MLE http://www.cs.cmu.edu/~pradeepr/701 (brief) itro to probability Basic otatios Radom variable - referrig to a elemet / evet whose status is ukow: A = it will rai tomorrow Domai
More informationMaximum Likelihood Estimation and Complexity Regularization
ECE90 Sprig 004 Statistical Regularizatio ad Learig Theory Lecture: 4 Maximum Likelihood Estimatio ad Complexity Regularizatio Lecturer: Rob Nowak Scribe: Pam Limpiti Review : Maximum Likelihood Estimatio
More informationClustering: Mixture Models
Clusterig: Mixture Models Machie Learig 10-601B Seyoug Kim May of these slides are derived from Tom Mitchell, Ziv- Bar Joseph, ad Eric Xig. Thaks! Problem with K- meas Hard Assigmet of Samples ito Three
More informationLecture 13: Maximum Likelihood Estimation
ECE90 Sprig 007 Statistical Learig Theory Istructor: R. Nowak Lecture 3: Maximum Likelihood Estimatio Summary of Lecture I the last lecture we derived a risk (MSE) boud for regressio problems; i.e., select
More informationProbabilistic Unsupervised Learning
HT2015: SC4 Statistical Data Miig ad Machie Learig Dio Sejdiovic Departmet of Statistics Oxford http://www.stats.ox.ac.u/~sejdiov/sdmml.html Probabilistic Methods Algorithmic approach: Data Probabilistic
More informationDistributional Similarity Models (cont.)
Distributioal Similarity Models (cot.) Regia Barzilay EECS Departmet MIT October 19, 2004 Sematic Similarity Vector Space Model Similarity Measures cosie Euclidea distace... Clusterig k-meas hierarchical
More informationDefinition 2 (Eigenvalue Expansion). We say a d-regular graph is a λ eigenvalue expander if
Expaer Graphs Graph Theory (Fall 011) Rutgers Uiversity Swastik Kopparty Throughout these otes G is a -regular graph 1 The Spectrum Let A G be the ajacecy matrix of G Let λ 1 λ λ be the eigevalues of A
More informationThe Chi Squared Distribution Page 1
The Chi Square Distributio Page Cosier the istributio of the square of a score take from N(, The probability that z woul have a value less tha is give by z / g ( ( e z if > F π, if < z where ( e g e z
More informationDistributional Similarity Models (cont.)
Sematic Similarity Vector Space Model Similarity Measures cosie Euclidea distace... Clusterig k-meas hierarchical Last Time EM Clusterig Soft versio of K-meas clusterig Iput: m dimesioal objects X = {
More information(average number of points per unit length). Note that Equation (9B1) does not depend on the
EE603 Class Notes 9/25/203 Joh Stesby Appeix 9-B: Raom Poisso Poits As iscusse i Chapter, let (t,t 2 ) eote the umber of Poisso raom poits i the iterval (t, t 2 ]. The quatity (t, t 2 ) is a o-egative-iteger-value
More informationProblem Set 4 Due Oct, 12
EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios
More informationLecture 2: Monte Carlo Simulation
STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?
More information(7 One- and Two-Sample Estimation Problem )
34 Stat Lecture Notes (7 Oe- ad Two-Sample Estimatio Problem ) ( Book*: Chapter 8,pg65) Probability& Statistics for Egieers & Scietists By Walpole, Myers, Myers, Ye Estimatio 1 ) ( ˆ S P i i Poit estimate:
More informationLecture 23 Rearrangement Inequality
Lecture 23 Rearragemet Iequality Holde Lee 6/4/ The Iequalities We start with a example Suppose there are four boxes cotaiig $0, $20, $50 ad $00 bills, respectively You may take 2 bills from oe box, 3
More informationECE 901 Lecture 13: Maximum Likelihood Estimation
ECE 90 Lecture 3: Maximum Likelihood Estimatio R. Nowak 5/7/009 The focus of this lecture is to cosider aother approach to learig based o maximum likelihood estimatio. Ulike earlier approaches cosidered
More information16 EXPECTATION MAXIMIZATION
16 EXPECTATION MAXIMIZATION A he is oly a egg s way of akig aother egg. Sauel Butler Suppose you were buildig a aive Bayes odel for a text categorizatio proble. After you were doe, your boss told you that
More informationRecurrence Relations
Recurrece Relatios Aalysis of recursive algorithms, such as: it factorial (it ) { if (==0) retur ; else retur ( * factorial(-)); } Let t be the umber of multiplicatios eeded to calculate factorial(). The
More informationConfidence Level We want to estimate the true mean of a random variable X economically and with confidence.
Cofidece Iterval 700 Samples Sample Mea 03 Cofidece Level 095 Margi of Error 0037 We wat to estimate the true mea of a radom variable X ecoomically ad with cofidece True Mea μ from the Etire Populatio
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 11
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple
More information1 Review and Overview
CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we
More informationCSE 527, Additional notes on MLE & EM
CSE 57 Lecture Notes: MLE & EM CSE 57, Additioal otes o MLE & EM Based o earlier otes by C. Grat & M. Narasimha Itroductio Last lecture we bega a examiatio of model based clusterig. This lecture will be
More informationIntro to Learning Theory
Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified
More informationAlgorithms in The Real World Fall 2002 Homework Assignment 2 Solutions
Algorithms i The Real Worl Fall 00 Homewor Assigmet Solutios Problem. Suppose that a bipartite graph with oes o the left a oes o the right is costructe by coectig each oe o the left to raomly-selecte oes
More informationThe Bayesian Learning Framework. Back to Maximum Likelihood. Naïve Bayes. Simple Example: Coin Tosses. Given a generative model
Back to Maximum Likelihood Give a geerative model f (x, y = k) =π k f k (x) Usig a geerative modellig approach, we assume a parametric form for f k (x) =f (x; k ) ad compute the MLE θ of θ =(π k, k ) k=
More informationSummary. Recap. Last Lecture. Let W n = W n (X 1,, X n ) = W n (X) be a sequence of estimators for
Last Lecture Biostatistics 602 - Statistical Iferece Lecture 17 Asymptotic Evaluatio of oit Estimators Hyu Mi Kag March 19th, 2013 What is a Bayes Risk? What is the Bayes rule Estimator miimizig square
More informationEmpirical Process Theory and Oracle Inequalities
Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi
More informationProbability in Medical Imaging
Chapter P Probability i Meical Imagig Cotets Itrouctio P1 Probability a isotropic emissios P2 Raioactive ecay statistics P4 Biomial coutig process P4 Half-life P5 Poisso process P6 Determiig activity of
More informationf(x i ; ) L(x; p) = i=1 To estimate the value of that maximizes L or equivalently ln L we will set =0, for i =1, 2,...,m p x i (1 p) 1 x i i=1
Parameter Estimatio Samples from a probability distributio F () are: [,,..., ] T.Theprobabilitydistributio has a parameter vector [,,..., m ] T. Estimator: Statistic used to estimate ukow. Estimate: Observed
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More informationStat 421-SP2012 Interval Estimation Section
Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible
More information6.867 Machine learning
6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples
More information5.1 Review of Singular Value Decomposition (SVD)
MGMT 69000: Topics i High-dimesioal Data Aalysis Falll 06 Lecture 5: Spectral Clusterig: Overview (cotd) ad Aalysis Lecturer: Jiamig Xu Scribe: Adarsh Barik, Taotao He, September 3, 06 Outlie Review of
More informationHow to Maximize a Function without Really Trying
How to Maximize a Fuctio without Really Tryig MARK FLANAGAN School of Electrical, Electroic ad Commuicatios Egieerig Uiversity College Dubli We will prove a famous elemetary iequality called The Rearragemet
More information5. Fractional Hot deck Imputation
5. Fractioal Hot deck Imputatio Itroductio Suppose that we are iterested i estimatig θ EY or eve θ 2 P ry < c where y fy x where x is always observed ad y is subject to missigess. Assume MAR i the sese
More informationMachine Learning for Data Science (CS 4786)
Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm
More informationHypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance
Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples?
More informationBoosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32
Boostig Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, 2017 1 / 32 Outlie 1 Admiistratio 2 Review of last lecture 3 Boostig Professor Ameet Talwalkar CS260
More informationUnsupervised Learning 2001
Usupervised Learig 2001 Lecture 3: The EM Algorithm Zoubi Ghahramai zoubi@gatsby.ucl.ac.uk Carl Edward Rasmusse edward@gatsby.ucl.ac.uk Gatsby Computatioal Neurosciece Uit MSc Itelliget Systems, Computer
More informationIntroduction to Artificial Intelligence CAP 4601 Summer 2013 Midterm Exam
Itroductio to Artificial Itelligece CAP 601 Summer 013 Midterm Exam 1. Termiology (7 Poits). Give the followig task eviromets, eter their properties/characteristics. The properties/characteristics of the
More information3. Calculus with distributions
6 RODICA D. COSTIN 3.1. Limits of istributios. 3. Calculus with istributios Defiitio 4. A sequece of istributios {u } coverges to the istributio u (all efie o the same space of test fuctios) if (φ, u )
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationIntelligent Systems I 08 SVM
Itelliget Systems I 08 SVM Stefa Harmelig & Philipp Heig 12. December 2013 Max Plack Istitute for Itelliget Systems Dptmt. of Empirical Iferece 1 / 30 Your feeback Ejoye most Laplace approximatio gettig
More informationKurskod: TAMS11 Provkod: TENB 21 March 2015, 14:00-18:00. English Version (no Swedish Version)
Kurskod: TAMS Provkod: TENB 2 March 205, 4:00-8:00 Examier: Xiagfeg Yag (Tel: 070 2234765). Please aswer i ENGLISH if you ca. a. You are allowed to use: a calculator; formel -och tabellsamlig i matematisk
More informationInhomogeneous Poisson process
Chapter 22 Ihomogeeous Poisso process We coclue our stuy of Poisso processes with the case of o-statioary rates. Let us cosier a arrival rate, λ(t), that with time, but oe that is still Markovia. That
More informationNUMERICAL METHODS FOR SOLVING EQUATIONS
Mathematics Revisio Guides Numerical Methods for Solvig Equatios Page 1 of 11 M.K. HOME TUITION Mathematics Revisio Guides Level: GCSE Higher Tier NUMERICAL METHODS FOR SOLVING EQUATIONS Versio:. Date:
More informationOutline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression
REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques
More informationLinear Support Vector Machines
Liear Support Vector Machies David S. Roseberg The Support Vector Machie For a liear support vector machie (SVM), we use the hypothesis space of affie fuctios F = { f(x) = w T x + b w R d, b R } ad evaluate
More informationLecture 7: Density Estimation: k-nearest Neighbor and Basis Approach
STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics.
More informationLearning Theory: Lecture Notes
Learig Theory: Lecture Notes Kamalika Chaudhuri October 4, 0 Cocetratio of Averages Cocetratio of measure is very useful i showig bouds o the errors of machie-learig algorithms. We will begi with a basic
More informationSimulation. Two Rule For Inverting A Distribution Function
Simulatio Two Rule For Ivertig A Distributio Fuctio Rule 1. If F(x) = u is costat o a iterval [x 1, x 2 ), the the uiform value u is mapped oto x 2 through the iversio process. Rule 2. If there is a jump
More informationw (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.
2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machie Learig (Fall 2014) Drs. Sha & Liu {feisha,yaliu.cs}@usc.edu October 9, 2014 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, 2014 1 / 49 Outlie Admiistratio
More informationMachine Learning 4771
Machie Learig 4771 Istructor: Toy Jebara Topic 14 Structurig Probability Fuctios for Storage Structurig Probability Fuctios for Iferece Basic Graphical Models Graphical Models Parameters as Nodes Structurig
More informationREGRESSION WITH QUADRATIC LOSS
REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d
More informationSparsification using Regular and Weighted. Graphs
Sparsificatio usig Regular a Weighte 1 Graphs Aly El Gamal ECE Departmet a Cooriate Sciece Laboratory Uiversity of Illiois at Urbaa-Champaig Abstract We review the state of the art results o spectral approximatio
More informationAnalytic Number Theory Solutions
Aalytic Number Theory Solutios Sea Li Corell Uiversity sl6@corell.eu Ja. 03 Itrouctio This ocumet is a work-i-progress solutio maual for Tom Apostol s Itrouctio to Aalytic Number Theory. The solutios were
More informationLecture #3. Math tools covered today
Toay s Program:. Review of previous lecture. QM free particle a particle i a bo. 3. Priciple of spectral ecompositio. 4. Fourth Postulate Math tools covere toay Lecture #3. Lear how to solve separable
More informationPattern Classification, Ch4 (Part 1)
Patter Classificatio All materials i these slides were take from Patter Classificatio (2d ed) by R O Duda, P E Hart ad D G Stork, Joh Wiley & Sos, 2000 with the permissio of the authors ad the publisher
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete
More informationMachine Learning. Logistic Regression -- generative verses discriminative classifier. Le Song /15-781, Spring 2008
Machie Learig 070/578 Srig 008 Logistic Regressio geerative verses discrimiative classifier Le Sog Lecture 5 Setember 4 0 Based o slides from Eric Xig CMU Readig: Cha. 3..34 CB Geerative vs. Discrimiative
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More informationUnderstanding Samples
1 Will Moroe CS 109 Samplig ad Bootstrappig Lecture Notes #17 August 2, 2017 Based o a hadout by Chris Piech I this chapter we are goig to talk about statistics calculated o samples from a populatio. We
More informationLet us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.
Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,
More informationPart I: Covers Sequence through Series Comparison Tests
Part I: Covers Sequece through Series Compariso Tests. Give a example of each of the followig: (a) A geometric sequece: (b) A alteratig sequece: (c) A sequece that is bouded, but ot coverget: (d) A sequece
More informationSolution of Final Exam : / Machine Learning
Solutio of Fial Exam : 10-701/15-781 Machie Learig Fall 2004 Dec. 12th 2004 Your Adrew ID i capital letters: Your full ame: There are 9 questios. Some of them are easy ad some are more difficult. So, if
More informationCHAPTER 10 INFINITE SEQUENCES AND SERIES
CHAPTER 10 INFINITE SEQUENCES AND SERIES 10.1 Sequeces 10.2 Ifiite Series 10.3 The Itegral Tests 10.4 Compariso Tests 10.5 The Ratio ad Root Tests 10.6 Alteratig Series: Absolute ad Coditioal Covergece
More informationSeunghee Ye Ma 8: Week 5 Oct 28
Week 5 Summary I Sectio, we go over the Mea Value Theorem ad its applicatios. I Sectio 2, we will recap what we have covered so far this term. Topics Page Mea Value Theorem. Applicatios of the Mea Value
More information15-780: Graduate Artificial Intelligence. Density estimation
5-780: Graduate Artificial Itelligece Desity estimatio Coditioal Probability Tables (CPT) But where do we get them? P(B)=.05 B P(E)=. E P(A B,E) )=.95 P(A B, E) =.85 P(A B,E) )=.5 P(A B, E) =.05 A P(J
More informationProbabilistic Unsupervised Learning
Statistical Data Miig ad Machie Learig Hilary Term 2016 Dio Sejdiovic Departmet of Statistics Oxford Slides ad other materials available at: http://www.stats.ox.ac.u/~sejdiov/sdmml Probabilistic Methods
More informationClassification with linear models
Lecture 8 Classificatio with liear models Milos Hauskrecht milos@cs.pitt.edu 539 Seott Square Geerative approach to classificatio Idea:. Represet ad lear the distributio, ). Use it to defie probabilistic
More information