Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019

Size: px
Start display at page:

Download "Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019"

Transcription

1 Outlie CSCI-567: Machie Learig Sprig 209 Gaussia mixture models Prof. Victor Adamchik 2 Desity estimatio U of Souther Califoria Mar. 26, Naive Bayes Revisited March 26, 209 / 57 March 26, / 57 Outlie Gaussia mixture models Gaussia mixture models GMM is a probabilistic approach for clusterig. Gaussia mixture models Motivatio ad Model EM algorithm EM applied to GMMs 2 Desity estimatio 3 Naive Bayes Revisited We wat to come up with a probabilistic model p to explai how the data is geerated. We will model each regio with a Gaussia distributio. To geerate a poit, we first radomly pick oe of the Gaussia models, the draw a poit accordig this Gaussia. March 26, / 57 March 26, / 57

2 GMM: formal defiitio A example A GMM has the followig desity fuctio: px = where ω k Nx µ k, Σ k = k= k= ω k 2π D Σ k e 2 x µ k T Σ k x µ k K: the umber of Gaussia compoets same as #clusters we wat µ k ad Σ k : mea ad covariace matrix of the k-th Gaussia ω,..., ω K : mixture weights, they represet how much each compoet cotributes to the fial distributio. It satisfies two properties: k, ω k > 0, ad ω k = k The coditioal distributios are px z = red = Nx µ, Σ px z = blue = Nx µ 2, Σ 2 px z = gree = Nx µ 3, Σ 3 Here z is the hidde latet variable. The margial distributio is px = prednx µ, Σ + pbluenx µ 2, Σ 2 + pgreenx µ 3, Σ 3 March 26, / 57 March 26, / 57 Learig GMMs Preview of EM for learig GMMs Learig a GMM meas fidig all the parameters = {ω k, µ k, Σ k } K k=. How to lear these parameters? A obvious attempt is maximum-likelihood estimatio MLE: fid l N = px ; = N = l px ; P The problem is itractable i geeral o-cocave problem, also there is a latet parameter. Oe solutio is to still apply GD/SGD, but a much more effective approach is the Expectatio Maximizatio EM algorithm. Step 0 Iitialize ω k, µ k, Σ k for each k [K] Step E-Step update the soft assigmet fixig parameters γ k = pz = k x ω k N x µ k, Σ k Step 2 M-Step update the model parameter fixig assigmets ω k = γ k µ k = γ kx N γ k Σ k = γ γ k x µ k x µ k T k Step 3 retur to Step if ot coverged March 26, / 57 March 26, / 57

3 EM algorithm EM algorithm I geeral EM is a heuristic to solve MLE with latet variables ot just GMM, i.e. fid the maximizer of P = N l px ; = is the parameters for a geeral probabilistic model x s are observed radom variables z s are latet variables Agai, directly solvig the objective is itractable. A geeral algorithm for dealig with hidde data. EM is a optimizatio strategy for objective fuctios that ca be iterpreted as likelihoods i the presece of missig data. EM is much simpler tha gradiet methods: o eed to choose step size. EM is a iterative algorithm with two steps: E-step: fill-i hidde values usig iferece M-step: apply stadard MLE method to completed data We will prove that EM always coverges to a local optimum of the likelihood. March 26, / 57 March 26, / 57 High level idea Derivatio of EM Keep maximizig a lower boud of P that is more maageable Fidig the lower boud of P : px, z ; l px ; = l pz x ; [ ] px, z ; = E z q l pz x ; true for ay z true for ay dist. q Let us recall the defiitio of expectatio E z q [fz] = z qzfz ad etropy Hz = E z q [l qz] = z qz l qz March 26, 209 / 57 March 26, / 57

4 Derivatio of EM Fidig the lower boud of P : px, z ; l px ; = l true for ay z pz x ; [ ] px, z ; = E z q l true for ay dist. q pz x ; [ ] pz x ; = E z q [l px, z ; ] E z q [l qz] E z q l qz [ ] pz x ; = E z q [l px, z ; ] + Hq E z q l H is etropy qz [ ] pz x ; E z q [l px, z ; ] + Hq l E z q qz Jese s iequality Jese s iequality Claim: E [l X] l E[X] Proof. By the defiitio of E[X] = N x + x x, the It follows, E [l X] = N l x + l x l x = N l N N N l = x l N N N x N = N = N = x x = This is the AGM iequality. For N = 2, it is just x x x March 26, / 57 March 26, / 57 Derivatio of EM Alteratively maximize the lower boud After applyig Jese s iequality, we obtai l px ; E z q [l px, z ; ] + Hq l E z q [ pz x ; qz Next, we observe that [ ] pz x ; E z q = qz z pz x ; qz = qz z ] pz x ; = We have foud a lower boud for the log-likelihood fuctio P = N l px ; = N = E z q [l px, z ; ] + Hq = F, {q } This holds for ay {q }, so how do we choose? It follows, l px ; E z q [l px, z ; ] + Hq Naturally, the oe that maximizes the lower boud i.e. the tightest lower boud! This is similar to K-meas: we will alteratively maximizig F over {q } ad. March 26, / 57 March 26, / 57

5 Pictorial explaatio P is o-cocave, but F maximize., {q t } ofte is cocave ad easy to Maximizig over {q } Fix t, ad maximize F over {q } ] E z q [l px, z ; t F, {q } = q = q k= q q k l px, z = k ; t q k l q k + Hq subject to coditios: q k 0 ad q k = k Next, write dow the Lagragia ad the apply KKT coditios. March 26, / 57 March 26, / 57 Maximizig over {q } Maximizig over The solutio to [ ] F, {q } = E z q l px, z ; t q q is you have to verify it by yourself q t z = pz = k x ; t i.e., the posterior distributio of z give x ad t. So at t, we foud the tightest lower boud F, {q t } : F, {q t } P for all. F t, {q t } = P t + Hq Fix {q t }, maximize over ote, Hq t is idepedet of : F =, {q t } N = E z q t Q ; t [l px, z ; ] {q t } are computed via t Q is called a complete likelihood ad is usually more tractable, sice z are ot latet variables aymore. March 26, / 57 March 26, / 57

6 Geeral EM algorithm Pictorial explaatio Step 0 Iitialize, t = Step E-Step update the posterior of latet variables q t = p x ; t ad obtai Expectatio of complete likelihood Q ; t = N = E z q t [l px, z ; ] Step 2 M-Step update the model parameter via Maximizatio t+ Q ; t P is o-cocave, but Q; t ofte is cocave ad easy to maximize. P t+ F t+ ; {q t } F t ; {q t } = P t So EM always icreases the objective value ad will coverge to some local maximum similar to K-meas. Step 3 t t + ad retur to Step if ot coverged March 26, / 57 March 26, / 57 Apply EM to lear GMMs E-Step: z = k = p z = k x ; t = p z = k ; t px z = k ; t = ω t k N x µ t k, Σt k q t This computes the soft assigmet γ k = q t z = k, i.e. coditioal probability of x belogig to cluster k. Apply EM to lear GMMs M-Step: Q, t = To fid ω,..., ω K, solve ω N = k= = = N = N = E z q t E z q t N {ω k,µ k,σ k } = k= γ k l ω k [l px, z ; ] [l pz ; + l px z ; ] γ k l ω k + l Nx µ k, Σ k To fid each µ k, Σ k, solve µ k,σ k N γ k l Nx µ k, Σ k = March 26, / 57 March 26, / 57

7 M-Step cotiued Solutios to previous two problems are very atural see slide 8, for each k ω k = γ k N i.e. weighted fractio of examples belogig to cluster k µ k = γ kx γ k i.e. weighted average of examples belogig to cluster k Σ k = γ γ k x µ k x µ k T k i.e weighted covariace of examples belogig to cluster k March 26, / 57 GMM: puttig it together EM for clusterig: Step 0 Iitialize ω k, µ k, Σ k for each k [K] Step E-Step update the soft assigmet fixig parameters γ k = pz = k x ω k N x µ k, Σ k Step 2 M-Step update the model parameter fixig assigmets ω k = γ k µ k = γ kx N γ k Σ k = γ γ k x µ k x µ k T k Step 3 retur to Step if ot coverged March 26, / 57 Coectio to K-meas K-meas is i fact a special case of EM for a simplified GMM: Let Σ k = σ 2 I for some fixed σ, so oly ω k ad µ k are parameters. EM becomes K-meas: N px ; = = N = k= pz = knx µ k If we assume hard assigmets pz = k =, if k = C, the Outlie Gaussia mixture models 2 Desity estimatio Parametric models Noparametric models N = = N px ; = = N exp 2σ 2 x µ C 2 2 = Nx µ C = µ,c N x µ C 2 2 GMM is a soft versio of K-meas ad it provides a probabilistic iterpretatio of the data. March 26, / 57 = 3 Naive Bayes Revisited March 26, / 57

8 Desity estimatio Parametric geerative models Observe what we have doe idirectly for clusterig with GMMs is: Give a traiig set x,..., x N, estimate a desity fuctio p that i.i.d. could have geerated this dataset via x p. This is exactly the problem of desity estimatio, aother importat usupervised learig problem. Useful for may dowstream applicatios we have see clusterig already, will see more applicatios today these applicatios also provide a way to measure quality of the desity estimator Parametric estimatio assumes a geerative model parametrized by : Examples: px = px ; GMM: px ; = K k= ω knx µ k, Σ k where = {ω k, µ k, Σ k } Multiomial for D examples with K possible values px = k ; = k where is a distributio over K elemets. Size of is idepedet of the traiig set size, so it s parametric. March 26, / 57 March 26, / 57 Parametric methods MLE for multiomial Agai, we apply MLE to lear the parameters : = N l px ; For some cases this is itractable ad we ca use EM to approximately solve MLE e.g. GMMs. = For some other cases this admits a simple closed-form solutio e.g. multiomial. = = N l px = x ; = = k= :x =k l k = N l x = z k l k where z k = { : x = k} is the umber of examples with value k. The solutio your TA4 is simply k = z k N z k, i.e. the fractio of examples with value k. k= March 26, / 57 March 26, / 57

9 Noparametric models Ca we estimate without assumig a fixed geerative model? High level idea Costruct somethig similar to a histogram: for each data poit, create a hump via a kerel sum up all the humps; more data - a higher hump picture from Wikipedia Kerel desity estimatio KDE is a commo approach for oparametric desity estimatio. Here kerel meas somethig differet from what we have see for kerel fuctio. We focus o the D cotiuous case. March 26, / 57 March 26, / 57 Kerel Differet kerels Kx KDE with a kerel Kx: R R cetered at x : px = N N Kx x = May choices for K, for example, Kx = 2π e x2 2, the stadard Gaussia desity e x2 2 2π 2 I[ x ] 3 4 max{ x2, 0} Properties of a kerel: symmetry: Kx = K x Kxdx =, this isures p is a desity fuctio. March 26, / 57 March 26, / 57

10 Badwidth Effect of badwidth picture from Wikipedia If Kx is a kerel, the for ay h > 0 K h u h K x h stretchig the kerel A larger h will smooth a desity. A small h will yield a desity that is spiky ad very hard to iterpret. ca be used as a kerel too verify the two properties yourself So, geeral KDE is determied by both the kerel K ad the badwidth h px = N N = K h x x = Nh x cotrols the ceter of each hump N x x K h = Assume Gaussia kerel. Gray curve is groud-truth Red: h = 0.05 Black: h = Gree: h = 2 h cotrols the width/variace of the humps March 26, / 57 March 26, / 57 Badwidth selectio Outlie Selectig h is a deep topic oe ca also do cross-validatio based o dowstream applicatios there are theoretically-motivated approaches Fid a value of h that miimizes the error betwee the estimated desity ad the true desity: E [ p KDE x px 2] = E [p KDE x px] 2 + V ar [p KDE x] Gaussia mixture models 2 Desity estimatio 3 Naive Bayes Revisited Setup ad assumptio Coectio to logistic regressio Geerative ad Discrimiative Models This expressio is a example of the bias-variace tradeoff, which we saw i the earlier lecture. March 26, / 57 March 26, / 57

11 Bayes optimal classifier Discrete features Suppose the data x, y is draw from a joit distributio px, y, the Bayes optimal classifier is f x = pc x i.e. predict the class with the largest coditioal probability. For a label c [C], py = c = { : y = c} N px, y is of course ukow, but we ca estimate it, which is exactly a desity estimatio problem! Observe that px, y = pypx y For each possible value k of a discrete feature d, px d = k y = c = { : x d = k, y = c} { : y = c} To estimate px y = c for some c [C], we are doig desity estimatio usig data with label y = c. March 26, / 57 March 26, / 57 Cotiuous features If the feature is cotiuous, we ca do parametric estimatio, e.g. via a Gaussia px d = x y = c = exp x µ cd 2 2πσcd 2σ 2 cd How to predict? Usig Naive Bayes assumptio: D px y = c = px d y = c the predictio for a ew example x is d= where µ cd ad σcd 2 are the empirical mea ad variace of feature d amog all examples with label c. or oparametric estimatio, e.g. via a kerel K ad badwidth h: px d = x y = c = { : y = c} :y =c K h x x d py = c x = = = px y = cpy = c px D py = c px d y = c d= l py = c + D l px d y = c d= March 26, / 57 March 26, / 57

12 Naive Bayes Naive Bayes For discrete features, pluggig i previous MLE estimatios gives = = py = c x l py = c + D l px d y = c d= l { : y = c} + D d= l { : x d = x d, y = c} { : y = c} For cotiuous features with a Gaussia model, = = = py = c x l py = c + D l px d y = c d= l { : y = c} + l { : y = c} D l exp x d µ cd 2 2πσcd d= D l σ cd + x d µ cd 2 d= 2σ 2 cd 2σ 2 cd March 26, / 57 March 26, / 57 Coectio to logistic regressio Coectio to logistic regressio Let us fix the variace for each feature to be σ i.e. ot a parameter of the model ay more, the the predictio becomes = = = py = c x l { : y = c} D l σ + x d µ cd 2 d= l { : y = c} x 2 D 2 2σ 2 d= D w c0 + w cd x d = wc T x d= where we deote w c0 = l { : y = c} D µ 2 cd d= 2σ 2 2σ 2 µ 2 D cd 2σ 2 + d= µ cd σ 2 x d liear classifier! ad w cd = µ cd σ 2. You ca verify py = c x e wt c x This is exactly the softmax fuctio, the same model we used for a probabilistic iterpretatio of logistic regressio! So what is differet the? They lear the parameters i differet ways: both via MLE, oe o py = c x, the other o px, y solutios are differet: logistic regressio has o closed-form, aive Bayes admits a simple closed-form March 26, / 57 March 26, / 57

13 Two differet modelig paradigms Geerative model v.s discrimiative model Suppose the traiig data is from a ukow joit probabilistic model px, y. There are two kids of classificatio models i machie learig geerative models ad discrimiative models. Discrimiative model Geerative model Differeces i assumig models for the data the geerative approach requires we specify the model for the joit distributio such as Naive Bayes, ad thus, maximize the joit likelihood log px, y the discrimiative approach discrimiative requires oly specifyig a model for the coditioal distributio such as logistic regressio, ad thus, maximize the coditioal likelihood log py x Sometimes, modelig by discrimiative approach is easier Sometimes, parameter estimatio by geerative approach is easier Example logistic regressio aive Bayes Model coditioal py x joit px, y might have same py x Learig MLE MLE Accuracy usually better for large N usually better for small N Remark more flexible, ca geerate data after learig March 26, / 57 March 26, / 57 Determiig sex ma or woma based o measuremets Example: Geerative approach Propose a model of the joit distributio of x = height, y =sex 280 red = female, blue=male our data red = female, blue=male weight Sex Height height weight height Ituitio: we will model how heights vary accordig to a Gaussia i each sub-populatio male ad female. Note: This is similar to Naive Bayes for detectig spam s. March 26, / 57 March 26, / 57

14 Model of the joit distributio Parameter estimatio px, y = pypx y p = 2πσ e x µ 2 2πσ2 e x µ 2 2 p 2 2σ 2 if y = 2σ 2 2 if y = 2 where p + p 2 = represets two prior probabilities that x is give the label or 2 respectively. px y is assumed to be Gaussias. weight red = female, blue=male height Likelihood of the traiig data D = {x, y } N = with y {, 2} log P D = log px, y = log p e x µ 2 2σ 2 2πσ :y = + :y =2 log p 2 e x µ 2 2 2σ 2 2 2πσ2 Maximize the likelihood fuctio p, p 2, µ, µ 2, σ, σ2 = log P D March 26, / 57 March 26, / 57 Decisio boudary Example of oliear decisio boudary The decisio boudary betwee two classes is defied by py = x py = 2 x which is equivalet to px y = py = px y = 2py = 2 Namely, 2 0 Parabolic Boudary x µ 2 2σ 2 log 2πσ + log p x µ 2 2 2σ 2 2 log 2πσ 2 + log p 2 2 It is quadratic i x. It follows for some a, b ad c, that The decisio boudary is ot liear! ax 2 + bx + c 0 March 26, / Note: the boudary is characterized by a quadratic fuctio, givig rise to the shape of parabolic curve. March 26, / 57

15 A special case What if we assume the two Gaussias have the same variace? We will get a liear decisio boudary From the previous slide: x µ 2 2σ 2 log 2πσ + log p x µ 2 2 2σ 2 2 log 2πσ 2 + log p 2 Settig σ = σ 2, we obtai bx + c 0 Note: equal variaces across two differet categories could be a very strog assumptio. For example, the plot suggests that the male populatio has slightly bigger variace i.e., bigger eclipse tha the female populatio. March 26, / 57

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar.

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar. Clusterig CM226: Machie Learig for Bioiformatics. Fall 216 Sriram Sakararama Ackowledgmets: Fei Sha, Ameet Talwalkar Clusterig 1 / 42 Admiistratio HW 1 due o Moday. Email/post o CCLE if you have questios.

More information

Clustering: Mixture Models

Clustering: Mixture Models Clusterig: Mixture Models Machie Learig 10-601B Seyoug Kim May of these slides are derived from Tom Mitchell, Ziv- Bar Joseph, ad Eric Xig. Thaks! Problem with K- meas Hard Assigmet of Samples ito Three

More information

Expectation-Maximization Algorithm.

Expectation-Maximization Algorithm. Expectatio-Maximizatio Algorithm. Petr Pošík Czech Techical Uiversity i Prague Faculty of Electrical Egieerig Dept. of Cyberetics MLE 2 Likelihood.........................................................................................................

More information

Expectation maximization

Expectation maximization Motivatio Expectatio maximizatio Subhrasu Maji CMSCI 689: Machie Learig 14 April 015 Suppose you are builig a aive Bayes spam classifier. After your are oe your boss tells you that there is o moey to label

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

Algorithms for Clustering

Algorithms for Clustering CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5 CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio

More information

Mixtures of Gaussians and the EM Algorithm

Mixtures of Gaussians and the EM Algorithm Mixtures of Gaussias ad the EM Algorithm CSE 6363 Machie Learig Vassilis Athitsos Computer Sciece ad Egieerig Departmet Uiversity of Texas at Arligto 1 Gaussias A popular way to estimate probability desity

More information

The Bayesian Learning Framework. Back to Maximum Likelihood. Naïve Bayes. Simple Example: Coin Tosses. Given a generative model

The Bayesian Learning Framework. Back to Maximum Likelihood. Naïve Bayes. Simple Example: Coin Tosses. Given a generative model Back to Maximum Likelihood Give a geerative model f (x, y = k) =π k f k (x) Usig a geerative modellig approach, we assume a parametric form for f k (x) =f (x; k ) ad compute the MLE θ of θ =(π k, k ) k=

More information

Lecture 11 and 12: Basic estimation theory

Lecture 11 and 12: Basic estimation theory Lecture ad 2: Basic estimatio theory Sprig 202 - EE 94 Networked estimatio ad cotrol Prof. Kha March 2 202 I. MAXIMUM-LIKELIHOOD ESTIMATORS The maximum likelihood priciple is deceptively simple. Louis

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

Axis Aligned Ellipsoid

Axis Aligned Ellipsoid Machie Learig for Data Sciece CS 4786) Lecture 6,7 & 8: Ellipsoidal Clusterig, Gaussia Mixture Models ad Geeral Mixture Models The text i black outlies high level ideas. The text i blue provides simple

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete

More information

Probabilistic Unsupervised Learning

Probabilistic Unsupervised Learning HT2015: SC4 Statistical Data Miig ad Machie Learig Dio Sejdiovic Departmet of Statistics Oxford http://www.stats.ox.ac.u/~sejdiov/sdmml.html Probabilistic Methods Algorithmic approach: Data Probabilistic

More information

Grouping 2: Spectral and Agglomerative Clustering. CS 510 Lecture #16 April 2 nd, 2014

Grouping 2: Spectral and Agglomerative Clustering. CS 510 Lecture #16 April 2 nd, 2014 Groupig 2: Spectral ad Agglomerative Clusterig CS 510 Lecture #16 April 2 d, 2014 Groupig (review) Goal: Detect local image features (SIFT) Describe image patches aroud features SIFT, SURF, HoG, LBP, Group

More information

15-780: Graduate Artificial Intelligence. Density estimation

15-780: Graduate Artificial Intelligence. Density estimation 5-780: Graduate Artificial Itelligece Desity estimatio Coditioal Probability Tables (CPT) But where do we get them? P(B)=.05 B P(E)=. E P(A B,E) )=.95 P(A B, E) =.85 P(A B,E) )=.5 P(A B, E) =.05 A P(J

More information

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques

More information

NYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018)

NYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018) NYU Ceter for Data Sciece: DS-GA 003 Machie Learig ad Computatioal Statistics (Sprig 208) Brett Berstei, David Roseberg, Be Jakubowski Jauary 20, 208 Istructios: Followig most lab ad lecture sectios, we

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Patter Recogitio Classificatio: No-Parametric Modelig Hamid R. Rabiee Jafar Muhammadi Sprig 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Ageda Parametric Modelig No-Parametric Modelig

More information

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics.

More information

Lecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead)

Lecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead) Lecture 4 Homework Hw 1 ad 2 will be reoped after class for every body. New deadlie 4/20 Hw 3 ad 4 olie (Nima is lead) Pod-cast lecture o-lie Fial projects Nima will register groups ext week. Email/tell

More information

ECE 901 Lecture 13: Maximum Likelihood Estimation

ECE 901 Lecture 13: Maximum Likelihood Estimation ECE 90 Lecture 3: Maximum Likelihood Estimatio R. Nowak 5/7/009 The focus of this lecture is to cosider aother approach to learig based o maximum likelihood estimatio. Ulike earlier approaches cosidered

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

Introduction to Machine Learning DIS10

Introduction to Machine Learning DIS10 CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig

More information

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information

CSE 527, Additional notes on MLE & EM

CSE 527, Additional notes on MLE & EM CSE 57 Lecture Notes: MLE & EM CSE 57, Additioal otes o MLE & EM Based o earlier otes by C. Grat & M. Narasimha Itroductio Last lecture we bega a examiatio of model based clusterig. This lecture will be

More information

Pattern Classification, Ch4 (Part 1)

Pattern Classification, Ch4 (Part 1) Patter Classificatio All materials i these slides were take from Patter Classificatio (2d ed) by R O Duda, P E Hart ad D G Stork, Joh Wiley & Sos, 2000 with the permissio of the authors ad the publisher

More information

The Expectation-Maximization (EM) Algorithm

The Expectation-Maximization (EM) Algorithm The Expectatio-Maximizatio (EM) Algorithm Readig Assigmets T. Mitchell, Machie Learig, McGraw-Hill, 997 (sectio 6.2, hard copy). S. Gog et al. Dyamic Visio: From Images to Face Recogitio, Imperial College

More information

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam. Probability ad Statistics FS 07 Secod Sessio Exam 09.0.08 Time Limit: 80 Miutes Name: Studet ID: This exam cotais 9 pages (icludig this cover page) ad 0 questios. A Formulae sheet is provided with the

More information

Frequentist Inference

Frequentist Inference Frequetist Iferece The topics of the ext three sectios are useful applicatios of the Cetral Limit Theorem. Without kowig aythig about the uderlyig distributio of a sequece of radom variables {X i }, for

More information

Lecture 13: Maximum Likelihood Estimation

Lecture 13: Maximum Likelihood Estimation ECE90 Sprig 007 Statistical Learig Theory Istructor: R. Nowak Lecture 3: Maximum Likelihood Estimatio Summary of Lecture I the last lecture we derived a risk (MSE) boud for regressio problems; i.e., select

More information

CSIE/GINM, NTU 2009/11/30 1

CSIE/GINM, NTU 2009/11/30 1 Itroductio ti to Machie Learig (Part (at1: Statistical Machie Learig Shou de Li CSIE/GINM, NTU sdli@csie.tu.edu.tw 009/11/30 1 Syllabus of a Itro ML course ( Machie Learig, Adrew Ng, Staford, Autum 009

More information

Naïve Bayes. Naïve Bayes

Naïve Bayes. Naïve Bayes Statistical Data Miig ad Machie Learig Hilary Term 206 Dio Sejdiovic Departmet of Statistics Oxford Slides ad other materials available at: http://www.stats.ox.ac.uk/~sejdiov/sdmml : aother plug-i classifier

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

Support vector machine revisited

Support vector machine revisited 6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector

More information

Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32

Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32 Boostig Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, 2017 1 / 32 Outlie 1 Admiistratio 2 Review of last lecture 3 Boostig Professor Ameet Talwalkar CS260

More information

Study the bias (due to the nite dimensional approximation) and variance of the estimators

Study the bias (due to the nite dimensional approximation) and variance of the estimators 2 Series Methods 2. Geeral Approach A model has parameters (; ) where is ite-dimesioal ad is oparametric. (Sometimes, there is o :) We will focus o regressio. The fuctio is approximated by a series a ite

More information

Classification with linear models

Classification with linear models Lecture 8 Classificatio with liear models Milos Hauskrecht milos@cs.pitt.edu 539 Seott Square Geerative approach to classificatio Idea:. Represet ad lear the distributio, ). Use it to defie probabilistic

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

1 Duality revisited. AM 221: Advanced Optimization Spring 2016

1 Duality revisited. AM 221: Advanced Optimization Spring 2016 AM 22: Advaced Optimizatio Sprig 206 Prof. Yaro Siger Sectio 7 Wedesday, Mar. 9th Duality revisited I this sectio, we will give a slightly differet perspective o duality. optimizatio program: f(x) x R

More information

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples?

More information

Lecture 12: September 27

Lecture 12: September 27 36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.

More information

Intro to Learning Theory

Intro to Learning Theory Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified

More information

Regression and generalization

Regression and generalization Regressio ad geeralizatio CE-717: Machie Learig Sharif Uiversity of Techology M. Soleymai Fall 2016 Curve fittig: probabilistic perspective Describig ucertaity over value of target variable as a probability

More information

Probability and MLE.

Probability and MLE. 10-701 Probability ad MLE http://www.cs.cmu.edu/~pradeepr/701 (brief) itro to probability Basic otatios Radom variable - referrig to a elemet / evet whose status is ukow: A = it will rai tomorrow Domai

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

Step 1: Function Set. Otherwise, output C 2. Function set: Including all different w and b

Step 1: Function Set. Otherwise, output C 2. Function set: Including all different w and b Logistic Regressio Step : Fuctio Set We wat to fid P w,b C x σ z = + exp z If P w,b C x.5, output C Otherwise, output C 2 z P w,b C x = σ z z = w x + b = w i x i + b i z Fuctio set: f w,b x = P w,b C x

More information

Vector Quantization: a Limiting Case of EM

Vector Quantization: a Limiting Case of EM . Itroductio & defiitios Assume that you are give a data set X = { x j }, j { 2,,, }, of d -dimesioal vectors. The vector quatizatio (VQ) problem requires that we fid a set of prototype vectors Z = { z

More information

Factor Analysis. Lecture 10: Factor Analysis and Principal Component Analysis. Sam Roweis

Factor Analysis. Lecture 10: Factor Analysis and Principal Component Analysis. Sam Roweis Lecture 10: Factor Aalysis ad Pricipal Compoet Aalysis Sam Roweis February 9, 2004 Whe we assume that the subspace is liear ad that the uderlyig latet variable has a Gaussia distributio we get a model

More information

Introductory statistics

Introductory statistics CM9S: Machie Learig for Bioiformatics Lecture - 03/3/06 Itroductory statistics Lecturer: Sriram Sakararama Scribe: Sriram Sakararama We will provide a overview of statistical iferece focussig o the key

More information

Agnostic Learning and Concentration Inequalities

Agnostic Learning and Concentration Inequalities ECE901 Sprig 2004 Statistical Regularizatio ad Learig Theory Lecture: 7 Agostic Learig ad Cocetratio Iequalities Lecturer: Rob Nowak Scribe: Aravid Kailas 1 Itroductio 1.1 Motivatio I the last lecture

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

Logit regression Logit regression

Logit regression Logit regression Logit regressio Logit regressio models the probability of Y= as the cumulative stadard logistic distributio fuctio, evaluated at z = β 0 + β X: Pr(Y = X) = F(β 0 + β X) F is the cumulative logistic distributio

More information

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion .87 Machie learig: lecture Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learig problem hypothesis class, estimatio algorithm loss ad estimatio criterio samplig, empirical ad epected losses

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

Lecture 2 October 11

Lecture 2 October 11 Itroductio to probabilistic graphical models 203/204 Lecture 2 October Lecturer: Guillaume Oboziski Scribes: Aymeric Reshef, Claire Verade Course webpage: http://www.di.es.fr/~fbach/courses/fall203/ 2.

More information

Jacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3

Jacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3 No-Parametric Techiques Jacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3 Parametric vs. No-Parametric Parametric Based o Fuctios (e.g Normal Distributio) Uimodal Oly oe peak Ulikely real data cofies

More information

Stat410 Probability and Statistics II (F16)

Stat410 Probability and Statistics II (F16) Some Basic Cocepts of Statistical Iferece (Sec 5.) Suppose we have a rv X that has a pdf/pmf deoted by f(x; θ) or p(x; θ), where θ is called the parameter. I previous lectures, we focus o probability problems

More information

Probabilistic Unsupervised Learning

Probabilistic Unsupervised Learning Statistical Data Miig ad Machie Learig Hilary Term 2016 Dio Sejdiovic Departmet of Statistics Oxford Slides ad other materials available at: http://www.stats.ox.ac.u/~sejdiov/sdmml Probabilistic Methods

More information

Introduction to Artificial Intelligence CAP 4601 Summer 2013 Midterm Exam

Introduction to Artificial Intelligence CAP 4601 Summer 2013 Midterm Exam Itroductio to Artificial Itelligece CAP 601 Summer 013 Midterm Exam 1. Termiology (7 Poits). Give the followig task eviromets, eter their properties/characteristics. The properties/characteristics of the

More information

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random Part III. Areal Data Aalysis 0. Comparative Tests amog Spatial Regressio Models While the otio of relative likelihood values for differet models is somewhat difficult to iterpret directly (as metioed above),

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machie Learig (Fall 2014) Drs. Sha & Liu {feisha,yaliu.cs}@usc.edu October 9, 2014 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, 2014 1 / 49 Outlie Admiistratio

More information

STAT Homework 2 - Solutions

STAT Homework 2 - Solutions STAT-36700 Homework - Solutios Fall 08 September 4, 08 This cotais solutios for Homework. Please ote that we have icluded several additioal commets ad approaches to the problems to give you better isight.

More information

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions Statistical ad Mathematical Methods DS-GA 00 December 8, 05. Short questios Sample Fial Problems Solutios a. Ax b has a solutio if b is i the rage of A. The dimesio of the rage of A is because A has liearly-idepedet

More information

3/8/2016. Contents in latter part PATTERN RECOGNITION AND MACHINE LEARNING. Dynamical Systems. Dynamical Systems. Linear Dynamical Systems

3/8/2016. Contents in latter part PATTERN RECOGNITION AND MACHINE LEARNING. Dynamical Systems. Dynamical Systems. Linear Dynamical Systems Cotets i latter part PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Liear Dyamical Systems What is differet from HMM? Kalma filter Its stregth ad limitatio Particle Filter Its simple

More information

Chapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian

Chapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian Chapter 2 EM algorithms The Expectatio-Maximizatio (EM) algorithm is a maximum likelihood method for models that have hidde variables eg. Gaussia Mixture Models (GMMs), Liear Dyamic Systems (LDSs) ad Hidde

More information

1 Review of Probability & Statistics

1 Review of Probability & Statistics 1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5

More information

1.010 Uncertainty in Engineering Fall 2008

1.010 Uncertainty in Engineering Fall 2008 MIT OpeCourseWare http://ocw.mit.edu.00 Ucertaity i Egieerig Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu.terms. .00 - Brief Notes # 9 Poit ad Iterval

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

Lecture Note 8 Point Estimators and Point Estimation Methods. MIT Spring 2006 Herman Bennett

Lecture Note 8 Point Estimators and Point Estimation Methods. MIT Spring 2006 Herman Bennett Lecture Note 8 Poit Estimators ad Poit Estimatio Methods MIT 14.30 Sprig 2006 Herma Beett Give a parameter with ukow value, the goal of poit estimatio is to use a sample to compute a umber that represets

More information

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where

More information

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Outline. L7: Probability Basics. Probability. Probability Theory. Bayes Law for Diagnosis. Which Hypothesis To Prefer? p(a,b) = p(b A) " p(a)

Outline. L7: Probability Basics. Probability. Probability Theory. Bayes Law for Diagnosis. Which Hypothesis To Prefer? p(a,b) = p(b A)  p(a) Outlie L7: Probability Basics CS 344R/393R: Robotics Bejami Kuipers. Bayes Law 2. Probability distributios 3. Decisios uder ucertaity Probability For a propositio A, the probability p(a is your degree

More information

Expectation and Variance of a random variable

Expectation and Variance of a random variable Chapter 11 Expectatio ad Variace of a radom variable The aim of this lecture is to defie ad itroduce mathematical Expectatio ad variace of a fuctio of discrete & cotiuous radom variables ad the distributio

More information

6.867 Machine learning, lecture 7 (Jaakkola) 1

6.867 Machine learning, lecture 7 (Jaakkola) 1 6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit

More information

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We

More information

Exponential Families and Bayesian Inference

Exponential Families and Bayesian Inference Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where

More information

1 Review and Overview

1 Review and Overview DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,

More information

Solution of Final Exam : / Machine Learning

Solution of Final Exam : / Machine Learning Solutio of Fial Exam : 10-701/15-781 Machie Learig Fall 2004 Dec. 12th 2004 Your Adrew ID i capital letters: Your full ame: There are 9 questios. Some of them are easy ad some are more difficult. So, if

More information

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f. Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

1 Review and Overview

1 Review and Overview CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we

More information

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates. 5. Data, Estimates, ad Models: quatifyig the accuracy of estimates. 5. Estimatig a Normal Mea 5.2 The Distributio of the Normal Sample Mea 5.3 Normal data, cofidece iterval for, kow 5.4 Normal data, cofidece

More information

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would

More information

Pattern Classification

Pattern Classification Patter Classificatio All materials i these slides were tae from Patter Classificatio (d ed) by R. O. Duda, P. E. Hart ad D. G. Stor, Joh Wiley & Sos, 000 with the permissio of the authors ad the publisher

More information

STAT Homework 1 - Solutions

STAT Homework 1 - Solutions STAT-36700 Homework 1 - Solutios Fall 018 September 11, 018 This cotais solutios for Homework 1. Please ote that we have icluded several additioal commets ad approaches to the problems to give you better

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen) Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................

More information

Machine Learning. Logistic Regression -- generative verses discriminative classifier. Le Song /15-781, Spring 2008

Machine Learning. Logistic Regression -- generative verses discriminative classifier. Le Song /15-781, Spring 2008 Machie Learig 070/578 Srig 008 Logistic Regressio geerative verses discrimiative classifier Le Sog Lecture 5 Setember 4 0 Based o slides from Eric Xig CMU Readig: Cha. 3..34 CB Geerative vs. Discrimiative

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods Support Vector Machies ad Kerel Methods Daiel Khashabi Fall 202 Last Update: September 26, 206 Itroductio I Support Vector Machies the goal is to fid a separator betwee data which has the largest margi,

More information

Dimensionality Reduction vs. Clustering

Dimensionality Reduction vs. Clustering Dimesioality Reductio vs. Clusterig Lecture 9: Cotiuous Latet Variable Models Sam Roweis Traiig such factor models (e.g. FA, PCA, ICA) is called dimesioality reductio. You ca thik of this as (o)liear regressio

More information

Distributional Similarity Models (cont.)

Distributional Similarity Models (cont.) Sematic Similarity Vector Space Model Similarity Measures cosie Euclidea distace... Clusterig k-meas hierarchical Last Time EM Clusterig Soft versio of K-meas clusterig Iput: m dimesioal objects X = {

More information