8 : Learning Partially Observed GM: the EM algorithm

Size: px
Start display at page:

Download "8 : Learning Partially Observed GM: the EM algorithm"

Transcription

1 10-708: Probabilistic Graphical Models, Sprig : Learig Partially Observed GM: the EM algorithm Lecturer: Eric P. Xig Scribes: Auric Qiao, Hao Zhag, Big Liu 1 Itroductio Two fudametal questios i graphical models are learig ad iferece. Learig is the process of estimatig the parameters, ad i some cases, the topology of the etwor, from data. We have covered learig methods for completely observed graphical models, both directed ad udirected, i previous chapters. Focus of this chapter is o learig of partially observed or uobserved graphical models. Two estimatio priciples that will be covered are maximal margi ad maximum etropy. O iferece approaches, we previously covered the elimiatio algorithm ad message-passig algorithm as exact iferece techiques. Other exact iferece algorithms iclude the juctio tree algorithm, which is less ofte used these days. Doig exact iferece, however, ca be expesive, especially i the domai of big data. Thus, may efforts have bee placed o approximate iferece techiques. These iclude stochastic simulatio, samplig methods, Marov chai Mote Carlo methods, Variatioal algorithms, etc. 2 Mixture Models Mixture models are models where the probability distributio of a data poit is defied by a weighted sum of coditioal lielihoods. We say that a distributio f is a mixture of K compoet distributio p 1, p 2,..., p K if: p(x) = λ P (x) with the λ > 0 beig the mixture proportio, ad λ = Uobserved Variables A variable ca be uobserved (latet) for a umber of reasos. It might be a imagiary quatity meat to provide some simplified ad abstract view of the data geeratio process, for example the speech recogitio models. It also ca be a real-world object or pheomea which is difficult or impossible to measure, such as the temperature of a star, causes of a disease, ad evolutioary acestors. Or else it might be a real-world object or pheomea which was ot sometimes measured, due to faulty system compoets, etc. Latet variable ca be either discrete or cotiuous. Discrete latet variables ca be used to partitio or cluster data ito sub-groups. Cotiuous latet variables (factors), o the other had, ca be used for dimesioality reductio (factor aalysis, etc). 1

2 2 8 : Learig Partially Observed GM: the EM algorithm 2.2 Gaussia Mixture Models A Gaussia mixture model is a probabilistic model i which we assume all data poits are geerated from a mixture of a fiite umber of Gaussia distributios with uow parameters. Cosider a mixture of K Gaussia mixtures, P (X µ, Σ) = π N (X µ, Σ ) where π is the mixture proportio, ad N (X µ, Σ ) is the mixture compoet. Let s see how this expressio is derived. By itroducig Z as a latet class idicator vector: P (z ) = muitiomial(z : π) = (Π ) z We have X as a coditioal Gaussia variable with a class-specific mea ad covariace: P (x z 1 = 1, µ, Σ) = (2π) m/2 Σ { exp 1/2 1 2 (x µ ) T Σ 1 (x µ ) } Therefore, the lielihood of a sample x ca be expressed as: P (x µ, Σ) = P (z, x) = p(z = 1 π)p(x z = 1, µ, Σ) = ] [(π ) z N(x : µ, Σ ) z ) z = π N(x µ, Σ )) Now we wat to do learig of the mixture model. I fully observed i.i.d settigs, the log lielihood ca be decomposed ito a sum of local terms (at least for directed models). This ca be illustrated as: l(θ, D) = log p(x, z θ) = log p(z θ z ) + log p(x z, θ x ) With latet variables, however, all the parameters become coupled together via margializatio: l(θ, D) = log z p(x, z θ) = log z p(z θ z )p(x z, θ x ) It ca be observed that the summatio is iside the log ad the log lielihood does ot decompose icely. It is hard to fid a closed form solutio to the maximum lielihood estimates. We the tur to lear the mixture desities usig gradiet descet o the log lielihood, with: l(θ) = log p(x θ) = log π p (x θ )

3 8 : Learig Partially Observed GM: the EM algorithm 3 Figure 1: Mixture Model Learig with Latet Variables By taig the derivative with respect to θ we have: l θ = 1 p(x θ) = = π p (x θ ) θ π p(x θ) p (x θ ) log p (x θ ) θ p ( θ ) log p (x θ ) π p(x θ) θ = r l θ = 0 Thus, the gradiet is the resposibility weighted sum of the idividual log lielihood gradiets. This ca be passed to a cojugate gradiet routie. 3 Parameter Costraits As we ca see above, mixture desities with latet variables are ot easy to lear. By usig gradiet approach whe o closed form solutio ca be derived, oe has less cotrols o the coditios of the parameters. I order to mitigate such effect, people did come up with solutios by imposig costraits o parameters. Ofte we have costraits o the parameters, e.g. π = 1, ad lettig Σ beig symmetric positive defiite (hece Σ ii > 0). We ca either use costraied optimizatio, or we ca re-parametrize i terms of ucostraied values. For ormalized weights, oe ca use the Softmax trasform. For covariace matrices, oe ca use the Cholesy decompositio i form of: where A is upper diagoal with positive diagoal: Σ 1 = A T A A ii = exp(λ i ) > 0; A ij = η ij (j > i); A ij = 0(j < i) where the parameters λ i ad η ij are ucostraied. The oe ca use the chai rule to compute l l π ad A.

4 4 8 : Learig Partially Observed GM: the EM algorithm 4 Idetifiability Idetifiability is a property that a model has to satisfy i order do to precise iferece. A mixture model iduces a multi-modal lielihood. Hece, gradiet ascet ca oly fid a local maximum. Mixture models are uidetifiable, sice we ca always switch the hidde labels without affectig the lielihood. Therefore, we should be careful i tryig to iterpret the meaig of latet variables. 5 Expectatio-Maximizatio Algorithm We will cotiue to use the Gaussia mixture model to demostrate the expectatio-maximizatio algorithm. If we assume that all variables are observed, we ca lear the parameters simply by usig MLE. The data log-lielihood fuctio is: l(θ, D) = log p(z, x ) = log p(z π)p(x z, µ, σ) = = log π z + log N(x ; µ, σ) z z log π z 1 2σ 2 (x µ ) 2 + C C is some costat that we do ot bother calculatig, sice maximizig the log-lielihood is idepedet of ay additive costat. Our maximum lielihood estimates of the parameters are the: ˆπ,MLE = arg max l(θ, D) π ˆµ,MLE = arg max l(θ, D) µ ˆσ,MLE = arg max l(θ, D) σ We ca easily put these fuctios ito a optimizatio algorithm, or i the case of ˆµ, obtai a simple closed form: ˆµ,MLE = z x z However, the above calculatios brea dow i the case of uobserved z. For example, it is ot possible to use the formula for ˆµ,MLE sice we do ot ow what values of z to use. 5.1 K-meas We ca obtai some isight ito solvig this problem by cosiderig a related algorithm: K-meas. The K-meas algorithm fids clusters of data poits as follows: 1. Iitialize poits ( ceters ) radomly. 2. Assig each data poit to the closest ceter. 3. Move each ceter to the mea of all data poits assiged to it. 4. Repeat from 2. util coverged.

5 8 : Learig Partially Observed GM: the EM algorithm 5 We ca view the K-meas problem as a graphical model, where the hidde variables is the ceter each data poit is assiged to. More formally, ad writig the above i a form that is closer to our Gaussia mixture model, we get: z (t) ( = arg max µ (t+1) = x µ (t) ) T ( 1(t) Σ ) (z δ (t), (z δ (t), x ) x µ (t) Here, z are the ceters assiged to each data poit, ad µ are the ceters. ) 5.2 The EM Algorithm Movig closer to the Gaussia mixture model, what if istead of usig the mea as ceters, we use Gaussia distributios? This way, each ceter is associated with a covariace matrix as well as a mea. Now, istead of assigig each data poit to the closest ceter, we ca do a soft assigmet to all ceters. For example, a data poit p ca be 1 2 assiged to ceter a, 1 4 ceter b, ad 1 4 ceter c. This essetially gives us the EM algorithm for Gaussia mixture models, ad iteratively improves the log-lielihood: l c (θ; x, z) = log p(z π) p(z x) + log p(x z, µ, Σ) p(z x) = z log π 1 z 2 ( (x µ ) T Σ 1 (x µ ) + log Σ + C ) The E step Durig the expectatio step, we compute the expected values of the sufficiet statistics of the hidde variables, give the curret estimates of the parameters. I the case of Gaussia mixture models, we compute the followig: ( τ (t) = z q (t) = p z = 1 x, µ (t), Σ (t)) = π (t) N ( Σ i π (t) i N x µ (t) ( x µ (t) i ), Σ(t) The E step is essetially performig iferece with a estimate of the parameters. I -meas, we calculate the hard assigmets z (t) while i the EM algorithm, we calculate the expected value of a sufficiet statistic τ (t). I the case of Gaussia mixture models, τ (t) = z q (t)., Σ (t) i ) The M step Durig the M step, we estimate the parameters with MLE, usig the sufficiet statistics of the hidde variables we calculated i the E step. I the case of Gaussia mixture models, the sufficiet statistic is the expected value of the hidde variables. π = arg max l c (θ) = π (t+1) π = z q (t) N µ = arg max µ l c (θ) = µ (t+1) = = τ (t) N τ (t) x τ (t) = N

6 6 8 : Learig Partially Observed GM: the EM algorithm I -meas, µ weighted sum. Σ = arg max l c (θ) = Σ (t+1) Σ = ( τ (t) ) ( x µ (t+1) τ (t) x µ (t+1) is calculated as a weighted sum where each weight is 0 or 1, while i EM, it is a proper ) T 5.3 Theory of the EM Algorithm Supposig that the latet variables z ca be observed, we defie the complete log-lielihood as l c (θ; x, z) = log p(x, z θ) With observed z, the above ca usually be optimized easily. p(x, z θ) is the product of may factors, so the log-lielihood fuctio above decomposes ito the sum of may factors, where each ca be obtimized separately. However, z is ot observed, so the log-lielihood caot be optimized i this way. The lielihood fuctio we are really tryig to optimize is the log of a margial probability, call the icomplete log-liehihood: l c (θ, x) = log p(x θ) = log z p(x, z θ) Ufortuately, this objective fuctio is a sum iside of a log, which does ot decouple as icely. Istead, we will optimize the followig expected complete log-lielihood: l c (θ; x, z) q = z q(z x, θ) log p(x, z θ) where q is ay arbitrary distributio. Notice, ow, that the log p(x, z θ) iside the sum, lie the complete log-lielihood fuctio, factors icely. So, the expected complete log-lielihood fuctio factors, ad is easier to optimize. However, it is ot clear if maximizig this fuctio actually maximizes the icomplete loglielihood. As a first step, we ca easily show that the expected complete log-lielihood is a lower boud for the icomplete log-lielihood: l(θ; x) = log p(x θ) = log z p(x, z θ) = log z p(x, z θ) q(z x) q(z x) Usig Jese s iequality, z = z q(z x) log p(x, z θ) q(z x) q(z x) log p(x, z θ) z q(z x) log q(z x) = l c (θ; x, z) q + H q Note that the above is the sum of the expected complete log-lielihood ad the etropy of the distributio q.

7 8 : Learig Partially Observed GM: the EM algorithm 7 As Coordiate Ascet The EM algorithm is also i some sese a coordiate descet algorithm. To see this, defie the followig fuctioal for a fixed data x, called the free eergy: F (q, θ) = z q(z x) log p(x, z θ) q(z x) l(θ; x) The EM algorithm is the E step: q t+1 = arg max q F (q, θ t ) M step: θ t+1 = arg max θ F (q t+1, θ t ) I performig the E step, we obtai the posterior distributio of the latet variables give the data ad the parameters, ie. q t+1 = arg max F (q, θ t ) = p(z x, θ t ) q To prove this, we just have to show that this value of q t+1 attais the boud l(θ; x) F (q t+1, θ). F (p(z x, θ t ), θ t ) = z = z p(z x, θ t ) log p(x, z θt ) p(z x, θ t ) q(z x) log p(x θ t ) = log p(x θ t ) = l(θ t ; x) Without loss of geerality, assume that p(x, z θ) is a geeralized expoetial family distributio, so that { } p(x, z θ) = 1 h(x, z) exp θ i f i (x, z) Z(θ) The the complete log lielihood uder q t+1 = p(z x, θ t ) is i l c (θ t ; x, z) p t+1 = z = i q(z x, θ t ) log p(x, z θ t ) A(θ) θ t i f i (x, z) q(z x,θ t ) A(θ) I the special case of p beig a geeralized liear model, = i θ t i η i (z) q(z x,θt )ξ i (x) A(θ) Let s ow tae a loo at the M step, ad remember that the free eergy is the sum of two terms F (q, θ) = l c (θ; x, z) q +H q. The secod term is costat whe maximizig with respect to θ, so we eed oly cosider the first term. θ t+1 = arg max q(z x) log p(x, z θ) θ z Whe q is optimal, this is the same as MLE of the fully observed p(x, z θ), but istead of the sufficiet statistics for z, their expectatios with respect to p(z x, θ) are used.

8 8 8 : Learig Partially Observed GM: the EM algorithm 6 Example: Hidde Marov Model With the EM algorithm, it becomes very straitforward for us to solve some simple graphical models with hidde variables. Let s tae the Hidde Marov Model (HMM) as a example. Geerally, the learig paradigms for a HMM model are classfied ito the followig two categories: Supervised Learig: we estimate the model parameters whe the label iformatio is provided, i.e., the right aswer is ow. For example, we lear from a geomic regio x = x 1, x 2,..., x where we have good aotatios of the CpG islads. Usupervised Learig: we estimate the model without label iformatio. For istace, the procupie geome; we do t ow how frequet are the CpG islads there, either do we ow their compositio, but we still wat to lear from the data. Our goal is to lear from the data usig Maximal Lielihood Estimatio (MLE). Specifically, we update the model parameters θ by maximizig p(x θ). Now we itroduce a algorithm amed Baum-Welch algorithm, which is a variat of EM algorithm for learig HMM. Recall that the complete log-lielihood of HMM is writte as follows: It ca be further writte as: l e (θ; x, y) = log p(x, y) = log (p(y,1 T t=2 p(y,t y,t 1 ) T p(x,t y,t ))) (1) t=1 < l c (θ; x, y) > = + (< Y,1 i > p(y,1 x ) log π i ) + T (< Y,t 1Y i j,t > p(y,t 1Y,t x ) log a i,j ) t=2 T (x,t < Y,t i > p(y,t x ) log b i, ) t=1 (2) where we deote A = {a i,j } = p(y t = j Y t 1 = i) as the trasitio matrix ad B = {b i,j } = p(y t = i x t = j) as the emissio matrix. Accordigly, we derive the EM algorithm of HMM as follows: The E step The M step γ i,t =< Y i,t >= p(y i,t = 1 x ) ξ i,j,t =< Y i,t 1Y j,t >= p(y i,t 1 = 1, Y j,t = 1 x ) π ML i = a ML ij = b ML i, = γi,1 N T t=2 ξi,j,t T 1 t=1 γi,t T t=1 γi,tx,t T 1 t=1 γi,t (3) (4)

9 8 : Learig Partially Observed GM: the EM algorithm 9 I the usupervised learig settig, we are give x = x 1,..., x N while their state path y = y 1,..., y N is ot uow. The EM algorithm chages accordigly: 1. Start with the best guess of parameters θ for the model 2. Estimate A i,j, B i,j i the traiig data: A ij =,t < Y i,t 1Y j,t > B i = m,t < Y i,t > x,t (5) 3. Update θ accordig to A ij, B j. Now, problem becomes supervised learig. 4. Repeat steps 2 ad 3 util covergece. The above algorithm is called the Baum-Welch algorithm. 7 EM for geeral BNs Now we demostrate the EM algorithm for geeral BNs: while ot coverged do for each ode i do ESS i = 0 ed for each data sample do do iferece with x,h for each ode i do ESS i+ =< SS i (x,i, x,πi ) > p(x,h x, H ) ed ed for each ode i do θ i = MLE(ESS i ) ed ed 8 Summary: EM algorithm I summary, EM algorithm ca be regarded as a way of maximizig lielihood fuctio for latet variable models. It fids MLE of parameters whe the origial (hard) problem ca be broe up ito two (easy) pieces: Estimate some missig or uobserved data from observed data ad curret parameters Usig this complete data, fid the maximum lielihood parameter estimates The EM algorithm alterates betwee fillig i the latet variables usig the best guess ad updatig the parameters based o this guess. Specifically, i the M-step we optimize a lower boud o the lielihood. I the E-step we close the gap, maig the boud equals the lielihood.

10 10 8 : Learig Partially Observed GM: the EM algorithm 9 Coditioal Mixture Model ad EM Algorithm Cosider the followig situatio: we will model p(y X) usig differet experts, each resposible for differet regios of the iput space. We have a latet variable Z, which chooses expert usig softmax gatig fuctio: p(z = 1 x) = Softmax(ξ T x) (6) While, each expert ca be a liear regressio model (or more complex models): The posterior expert resposibilities are: Now we derive the EM for coditioal mixture model: The objective fuctio is as follows: p(y x, z = 1) = N(y; θ T x, σ 2 ) (7) p(z = 1 x, y, θ) = p(z = 1 x)p (y x, θ, σ 2 ) j p(zj = 1 x)p j (y x, θ j, σ 2 j ) (8) < l c (θ; x, y, z) > = < log p(z x, ξ) > p(z x,y) + < log p(y x, z, θ, σ) > p(z x,y) The EM algorithm: E-step: = < z > log(softmax(ξ T x )) 1 2 < z > ( (y θ T x ) σ 2 + log σ 2 + C) τ (t) = p(z = 1 x, y, θ) = p(z = 1 x )p (y x, θ, σ 2 ) j p(zj = 1 x )p j (y x, θ j, σ 2 j ) (10) (9) M-step: We use the ormal equatio for stadard liear regressio: θ = (X T X) 1 X T Y, but with the data reweighted by τ We ca also use IRLS or reweighted IRLS to update {ξ, θ, σ } based o data pair (x, y ), with weights τ (t). 10 Hierarchical Mixture of experts The hierarchial mixture of experts is lie a soft versio of a depth-2 classificatio/regressio tree. I the model, the P (Y X, G 1, G 2 ) ca be modeled as a GLIM, with parameters depedet o the values of G 1 ad G 2, which specify a coditioal path to a give leaf i the tree. 11 Mixture of overlappig experts Differet from the typical mixture of experts, by removig the li betwee X ad Z i the graphical model of mixture of experts, we ca mae the partitios idepedet of the iput, thus allowig overlappig. Accordigly, this leads to a mixture of liear regressors, i which each subpopulatio has a differet coditioal mea: p(z = 1 x, y, θ) = p(z = 1 x)p (y x, θ σ 2) j p(zj = 1 x)p j (y x, θ j, σj 2) (11)

11 8 : Learig Partially Observed GM: the EM algorithm Partially Hidde Data We ca also lear the model parameters whe there are missig(hidde) variables. I this case, the cost fuctio becomes: l c (θ; D) = log p(x, y θ) + log p(x m, y m θ) (12) y Complete m Missig Now we ca thi of this i a ew way: i the E-step we estimate the hidde variables o the icomplete cases oly. The, i the M-step we optimize the log lielihood o the complete data plus the expected lielihood o the icomplete data usig the E-step. 13 EM Variats I this sectio, we briefly itroduce two variats of the basic EM algorithm: The first oe is the sparse EM. It does ot re-compute exactly the posterior probability o each data poit uder all models, because it is almost zero. Istead, it eeps a active list which you update every oce i a while. The secod oe is the geeralized (icomplete) EM. Sometimes we may fid that it s very difficult to fid the ML parameters i the M-step, eve give the completed data. However, we ca still mae progress by doig a M-step that improves the lielihood a bit. This is i the lie of the IRLS step i the mixture of experts models. 14 A Report Card for EM EM is oe of the most popular methods to get the MLE or MAP estimatio for a model with latet variables. The advatages of EM are listed as follows. o learig rate (step-size) parameter automatically eforces parameter costraits very fast for low dimesios each iteratio guarateed to improve lielihood While, the disadvatages of EM are ca get stuc i local miima ca be slower tha cojugate gradiet (especially ear covergece) requires expesive iferece step is a maximum lilihood/map method There are still some moder developmets upo traditioal EM, such as covex relaxatio, direct o-covex optimizatio.

Expectation-Maximization Algorithm.

Expectation-Maximization Algorithm. Expectatio-Maximizatio Algorithm. Petr Pošík Czech Techical Uiversity i Prague Faculty of Electrical Egieerig Dept. of Cyberetics MLE 2 Likelihood.........................................................................................................

More information

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar.

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar. Clusterig CM226: Machie Learig for Bioiformatics. Fall 216 Sriram Sakararama Ackowledgmets: Fei Sha, Ameet Talwalkar Clusterig 1 / 42 Admiistratio HW 1 due o Moday. Email/post o CCLE if you have questios.

More information

Probabilistic Unsupervised Learning

Probabilistic Unsupervised Learning HT2015: SC4 Statistical Data Miig ad Machie Learig Dio Sejdiovic Departmet of Statistics Oxford http://www.stats.ox.ac.u/~sejdiov/sdmml.html Probabilistic Methods Algorithmic approach: Data Probabilistic

More information

Axis Aligned Ellipsoid

Axis Aligned Ellipsoid Machie Learig for Data Sciece CS 4786) Lecture 6,7 & 8: Ellipsoidal Clusterig, Gaussia Mixture Models ad Geeral Mixture Models The text i black outlies high level ideas. The text i blue provides simple

More information

Algorithms for Clustering

Algorithms for Clustering CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat

More information

The Expectation-Maximization (EM) Algorithm

The Expectation-Maximization (EM) Algorithm The Expectatio-Maximizatio (EM) Algorithm Readig Assigmets T. Mitchell, Machie Learig, McGraw-Hill, 997 (sectio 6.2, hard copy). S. Gog et al. Dyamic Visio: From Images to Face Recogitio, Imperial College

More information

5 : Exponential Family and Generalized Linear Models

5 : Exponential Family and Generalized Linear Models 0-708: Probabilistic Graphical Models 0-708, Sprig 206 5 : Expoetial Family ad Geeralized Liear Models Lecturer: Matthew Gormley Scribes: Yua Li, Yichog Xu, Silu Wag Expoetial Family Probability desity

More information

Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019

Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019 Outlie CSCI-567: Machie Learig Sprig 209 Gaussia mixture models Prof. Victor Adamchik 2 Desity estimatio U of Souther Califoria Mar. 26, 209 3 Naive Bayes Revisited March 26, 209 / 57 March 26, 209 2 /

More information

Probabilistic Unsupervised Learning

Probabilistic Unsupervised Learning Statistical Data Miig ad Machie Learig Hilary Term 2016 Dio Sejdiovic Departmet of Statistics Oxford Slides ad other materials available at: http://www.stats.ox.ac.u/~sejdiov/sdmml Probabilistic Methods

More information

Clustering: Mixture Models

Clustering: Mixture Models Clusterig: Mixture Models Machie Learig 10-601B Seyoug Kim May of these slides are derived from Tom Mitchell, Ziv- Bar Joseph, ad Eric Xig. Thaks! Problem with K- meas Hard Assigmet of Samples ito Three

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

1 Duality revisited. AM 221: Advanced Optimization Spring 2016

1 Duality revisited. AM 221: Advanced Optimization Spring 2016 AM 22: Advaced Optimizatio Sprig 206 Prof. Yaro Siger Sectio 7 Wedesday, Mar. 9th Duality revisited I this sectio, we will give a slightly differet perspective o duality. optimizatio program: f(x) x R

More information

Mixtures of Gaussians and the EM Algorithm

Mixtures of Gaussians and the EM Algorithm Mixtures of Gaussias ad the EM Algorithm CSE 6363 Machie Learig Vassilis Athitsos Computer Sciece ad Egieerig Departmet Uiversity of Texas at Arligto 1 Gaussias A popular way to estimate probability desity

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Patter Recogitio Classificatio: No-Parametric Modelig Hamid R. Rabiee Jafar Muhammadi Sprig 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Ageda Parametric Modelig No-Parametric Modelig

More information

CSE 527, Additional notes on MLE & EM

CSE 527, Additional notes on MLE & EM CSE 57 Lecture Notes: MLE & EM CSE 57, Additioal otes o MLE & EM Based o earlier otes by C. Grat & M. Narasimha Itroductio Last lecture we bega a examiatio of model based clusterig. This lecture will be

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

Introduction to Machine Learning DIS10

Introduction to Machine Learning DIS10 CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig

More information

Lecture 12: September 27

Lecture 12: September 27 36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.

More information

PHYC - 505: Statistical Mechanics Homework Assignment 4 Solutions

PHYC - 505: Statistical Mechanics Homework Assignment 4 Solutions PHYC - 55: Statistical Mechaics Homewor Assigmet 4 Solutios Due February 5, 14 1. Cosider a ifiite classical chai of idetical masses coupled by earest eighbor sprigs with idetical sprig costats. a Write

More information

Estimation for Complete Data

Estimation for Complete Data Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

Math 113 Exam 3 Practice

Math 113 Exam 3 Practice Math Exam Practice Exam 4 will cover.-., 0. ad 0.. Note that eve though. was tested i exam, questios from that sectios may also be o this exam. For practice problems o., refer to the last review. This

More information

Intro to Learning Theory

Intro to Learning Theory Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified

More information

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients.

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients. Defiitios ad Theorems Remember the scalar form of the liear programmig problem, Miimize, Subject to, f(x) = c i x i a 1i x i = b 1 a mi x i = b m x i 0 i = 1,2,, where x are the decisio variables. c, b,

More information

Chapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian

Chapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian Chapter 2 EM algorithms The Expectatio-Maximizatio (EM) algorithm is a maximum likelihood method for models that have hidde variables eg. Gaussia Mixture Models (GMMs), Liear Dyamic Systems (LDSs) ad Hidde

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where

More information

IP Reference guide for integer programming formulations.

IP Reference guide for integer programming formulations. IP Referece guide for iteger programmig formulatios. by James B. Orli for 15.053 ad 15.058 This documet is iteded as a compact (or relatively compact) guide to the formulatio of iteger programs. For more

More information

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.

More information

Math 155 (Lecture 3)

Math 155 (Lecture 3) Math 55 (Lecture 3) September 8, I this lecture, we ll cosider the aswer to oe of the most basic coutig problems i combiatorics Questio How may ways are there to choose a -elemet subset of the set {,,,

More information

Sequences, Mathematical Induction, and Recursion. CSE 2353 Discrete Computational Structures Spring 2018

Sequences, Mathematical Induction, and Recursion. CSE 2353 Discrete Computational Structures Spring 2018 CSE 353 Discrete Computatioal Structures Sprig 08 Sequeces, Mathematical Iductio, ad Recursio (Chapter 5, Epp) Note: some course slides adopted from publisher-provided material Overview May mathematical

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

Unsupervised Learning 2001

Unsupervised Learning 2001 Usupervised Learig 2001 Lecture 3: The EM Algorithm Zoubi Ghahramai zoubi@gatsby.ucl.ac.uk Carl Edward Rasmusse edward@gatsby.ucl.ac.uk Gatsby Computatioal Neurosciece Uit MSc Itelliget Systems, Computer

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

The Method of Least Squares. To understand least squares fitting of data.

The Method of Least Squares. To understand least squares fitting of data. The Method of Least Squares KEY WORDS Curve fittig, least square GOAL To uderstad least squares fittig of data To uderstad the least squares solutio of icosistet systems of liear equatios 1 Motivatio Curve

More information

Maximum Likelihood Estimation and Complexity Regularization

Maximum Likelihood Estimation and Complexity Regularization ECE90 Sprig 004 Statistical Regularizatio ad Learig Theory Lecture: 4 Maximum Likelihood Estimatio ad Complexity Regularizatio Lecturer: Rob Nowak Scribe: Pam Limpiti Review : Maximum Likelihood Estimatio

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

Exponential Families and Bayesian Inference

Exponential Families and Bayesian Inference Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where

More information

Factor Analysis. Lecture 10: Factor Analysis and Principal Component Analysis. Sam Roweis

Factor Analysis. Lecture 10: Factor Analysis and Principal Component Analysis. Sam Roweis Lecture 10: Factor Aalysis ad Pricipal Compoet Aalysis Sam Roweis February 9, 2004 Whe we assume that the subspace is liear ad that the uderlyig latet variable has a Gaussia distributio we get a model

More information

Introduction to Computational Molecular Biology. Gibbs Sampling

Introduction to Computational Molecular Biology. Gibbs Sampling 18.417 Itroductio to Computatioal Molecular Biology Lecture 19: November 16, 2004 Scribe: Tushara C. Karuarata Lecturer: Ross Lippert Editor: Tushara C. Karuarata Gibbs Samplig Itroductio Let s first recall

More information

Regression and generalization

Regression and generalization Regressio ad geeralizatio CE-717: Machie Learig Sharif Uiversity of Techology M. Soleymai Fall 2016 Curve fittig: probabilistic perspective Describig ucertaity over value of target variable as a probability

More information

A Note on Effi cient Conditional Simulation of Gaussian Distributions. April 2010

A Note on Effi cient Conditional Simulation of Gaussian Distributions. April 2010 A Note o Effi ciet Coditioal Simulatio of Gaussia Distributios A D D C S S, U B C, V, BC, C April 2010 A Cosider a multivariate Gaussia radom vector which ca be partitioed ito observed ad uobserved compoetswe

More information

Output Analysis and Run-Length Control

Output Analysis and Run-Length Control IEOR E4703: Mote Carlo Simulatio Columbia Uiversity c 2017 by Marti Haugh Output Aalysis ad Ru-Legth Cotrol I these otes we describe how the Cetral Limit Theorem ca be used to costruct approximate (1 α%

More information

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector Summary ad Discussio o Simultaeous Aalysis of Lasso ad Datzig Selector STAT732, Sprig 28 Duzhe Wag May 4, 28 Abstract This is a discussio o the work i Bickel, Ritov ad Tsybakov (29). We begi with a short

More information

NYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018)

NYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018) NYU Ceter for Data Sciece: DS-GA 003 Machie Learig ad Computatioal Statistics (Sprig 208) Brett Berstei, David Roseberg, Be Jakubowski Jauary 20, 208 Istructios: Followig most lab ad lecture sectios, we

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

THE KALMAN FILTER RAUL ROJAS

THE KALMAN FILTER RAUL ROJAS THE KALMAN FILTER RAUL ROJAS Abstract. This paper provides a getle itroductio to the Kalma filter, a umerical method that ca be used for sesor fusio or for calculatio of trajectories. First, we cosider

More information

Solution of Final Exam : / Machine Learning

Solution of Final Exam : / Machine Learning Solutio of Fial Exam : 10-701/15-781 Machie Learig Fall 2004 Dec. 12th 2004 Your Adrew ID i capital letters: Your full ame: There are 9 questios. Some of them are easy ad some are more difficult. So, if

More information

REGRESSION (Physics 1210 Notes, Partial Modified Appendix A)

REGRESSION (Physics 1210 Notes, Partial Modified Appendix A) REGRESSION (Physics 0 Notes, Partial Modified Appedix A) HOW TO PERFORM A LINEAR REGRESSION Cosider the followig data poits ad their graph (Table I ad Figure ): X Y 0 3 5 3 7 4 9 5 Table : Example Data

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture 9: Pricipal Compoet Aalysis The text i black outlies mai ideas to retai from the lecture. The text i blue give a deeper uderstadig of how we derive or get

More information

ADVANCED SOFTWARE ENGINEERING

ADVANCED SOFTWARE ENGINEERING ADVANCED SOFTWARE ENGINEERING COMP 3705 Exercise Usage-based Testig ad Reliability Versio 1.0-040406 Departmet of Computer Ssciece Sada Narayaappa, Aeliese Adrews Versio 1.1-050405 Departmet of Commuicatio

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

Element sampling: Part 2

Element sampling: Part 2 Chapter 4 Elemet samplig: Part 2 4.1 Itroductio We ow cosider uequal probability samplig desigs which is very popular i practice. I the uequal probability samplig, we ca improve the efficiecy of the resultig

More information

Distributional Similarity Models (cont.)

Distributional Similarity Models (cont.) Distributioal Similarity Models (cot.) Regia Barzilay EECS Departmet MIT October 19, 2004 Sematic Similarity Vector Space Model Similarity Measures cosie Euclidea distace... Clusterig k-meas hierarchical

More information

Distributional Similarity Models (cont.)

Distributional Similarity Models (cont.) Sematic Similarity Vector Space Model Similarity Measures cosie Euclidea distace... Clusterig k-meas hierarchical Last Time EM Clusterig Soft versio of K-meas clusterig Iput: m dimesioal objects X = {

More information

Chapter 2: Numerical Methods

Chapter 2: Numerical Methods Chapter : Numerical Methods. Some Numerical Methods for st Order ODEs I this sectio, a summar of essetial features of umerical methods related to solutios of ordiar differetial equatios is give. I geeral,

More information

Statistical Inference Based on Extremum Estimators

Statistical Inference Based on Extremum Estimators T. Rotheberg Fall, 2007 Statistical Iferece Based o Extremum Estimators Itroductio Suppose 0, the true value of a p-dimesioal parameter, is kow to lie i some subset S R p : Ofte we choose to estimate 0

More information

Lecture 2 October 11

Lecture 2 October 11 Itroductio to probabilistic graphical models 203/204 Lecture 2 October Lecturer: Guillaume Oboziski Scribes: Aymeric Reshef, Claire Verade Course webpage: http://www.di.es.fr/~fbach/courses/fall203/ 2.

More information

Lecture 8: Solving the Heat, Laplace and Wave equations using finite difference methods

Lecture 8: Solving the Heat, Laplace and Wave equations using finite difference methods Itroductory lecture otes o Partial Differetial Equatios - c Athoy Peirce. Not to be copied, used, or revised without explicit writte permissio from the copyright ower. 1 Lecture 8: Solvig the Heat, Laplace

More information

Chapter 2 The Monte Carlo Method

Chapter 2 The Monte Carlo Method Chapter 2 The Mote Carlo Method The Mote Carlo Method stads for a broad class of computatioal algorithms that rely o radom sampligs. It is ofte used i physical ad mathematical problems ad is most useful

More information

Recursive Algorithms. Recurrences. Recursive Algorithms Analysis

Recursive Algorithms. Recurrences. Recursive Algorithms Analysis Recursive Algorithms Recurreces Computer Sciece & Egieerig 35: Discrete Mathematics Christopher M Bourke cbourke@cseuledu A recursive algorithm is oe i which objects are defied i terms of other objects

More information

Support vector machine revisited

Support vector machine revisited 6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector

More information

Kinetics of Complex Reactions

Kinetics of Complex Reactions Kietics of Complex Reactios by Flick Colema Departmet of Chemistry Wellesley College Wellesley MA 28 wcolema@wellesley.edu Copyright Flick Colema 996. All rights reserved. You are welcome to use this documet

More information

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam. Probability ad Statistics FS 07 Secod Sessio Exam 09.0.08 Time Limit: 80 Miutes Name: Studet ID: This exam cotais 9 pages (icludig this cover page) ad 0 questios. A Formulae sheet is provided with the

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5 CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio

More information

Roberto s Notes on Series Chapter 2: Convergence tests Section 7. Alternating series

Roberto s Notes on Series Chapter 2: Convergence tests Section 7. Alternating series Roberto s Notes o Series Chapter 2: Covergece tests Sectio 7 Alteratig series What you eed to kow already: All basic covergece tests for evetually positive series. What you ca lear here: A test for series

More information

Lecture 11 and 12: Basic estimation theory

Lecture 11 and 12: Basic estimation theory Lecture ad 2: Basic estimatio theory Sprig 202 - EE 94 Networked estimatio ad cotrol Prof. Kha March 2 202 I. MAXIMUM-LIKELIHOOD ESTIMATORS The maximum likelihood priciple is deceptively simple. Louis

More information

1 Covariance Estimation

1 Covariance Estimation Eco 75 Lecture 5 Covariace Estimatio ad Optimal Weightig Matrices I this lecture, we cosider estimatio of the asymptotic covariace matrix B B of the extremum estimator b : Covariace Estimatio Lemma 4.

More information

Differentiable Convex Functions

Differentiable Convex Functions Differetiable Covex Fuctios The followig picture motivates Theorem 11. f ( x) f ( x) f '( x)( x x) ˆx x 1 Theorem 11 : Let f : R R be differetiable. The, f is covex o the covex set C R if, ad oly if for

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

1.010 Uncertainty in Engineering Fall 2008

1.010 Uncertainty in Engineering Fall 2008 MIT OpeCourseWare http://ocw.mit.edu.00 Ucertaity i Egieerig Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu.terms. .00 - Brief Notes # 9 Poit ad Iterval

More information

Lecture 7: October 18, 2017

Lecture 7: October 18, 2017 Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem

More information

6.883: Online Methods in Machine Learning Alexander Rakhlin

6.883: Online Methods in Machine Learning Alexander Rakhlin 6.883: Olie Methods i Machie Learig Alexader Rakhli LECTURES 5 AND 6. THE EXPERTS SETTING. EXPONENTIAL WEIGHTS All the algorithms preseted so far halluciate the future values as radom draws ad the perform

More information

Lecture 9: September 19

Lecture 9: September 19 36-700: Probability ad Mathematical Statistics I Fall 206 Lecturer: Siva Balakrisha Lecture 9: September 9 9. Review ad Outlie Last class we discussed: Statistical estimatio broadly Pot estimatio Bias-Variace

More information

10/2/ , 5.9, Jacob Hays Amit Pillay James DeFelice

10/2/ , 5.9, Jacob Hays Amit Pillay James DeFelice 0//008 Liear Discrimiat Fuctios Jacob Hays Amit Pillay James DeFelice 5.8, 5.9, 5. Miimum Squared Error Previous methods oly worked o liear separable cases, by lookig at misclassified samples to correct

More information

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10 DS 00: Priciples ad Techiques of Data Sciece Date: April 3, 208 Name: Hypothesis Testig Discussio #0. Defie these terms below as they relate to hypothesis testig. a) Data Geeratio Model: Solutio: A set

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

Vector Quantization: a Limiting Case of EM

Vector Quantization: a Limiting Case of EM . Itroductio & defiitios Assume that you are give a data set X = { x j }, j { 2,,, }, of d -dimesioal vectors. The vector quatizatio (VQ) problem requires that we fid a set of prototype vectors Z = { z

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

NUMERICAL METHODS FOR SOLVING EQUATIONS

NUMERICAL METHODS FOR SOLVING EQUATIONS Mathematics Revisio Guides Numerical Methods for Solvig Equatios Page 1 of 11 M.K. HOME TUITION Mathematics Revisio Guides Level: GCSE Higher Tier NUMERICAL METHODS FOR SOLVING EQUATIONS Versio:. Date:

More information

CS322: Network Analysis. Problem Set 2 - Fall 2009

CS322: Network Analysis. Problem Set 2 - Fall 2009 Due October 9 009 i class CS3: Network Aalysis Problem Set - Fall 009 If you have ay questios regardig the problems set, sed a email to the course assistats: simlac@staford.edu ad peleato@staford.edu.

More information

Algebra of Least Squares

Algebra of Least Squares October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal

More information

CHAPTER I: Vector Spaces

CHAPTER I: Vector Spaces CHAPTER I: Vector Spaces Sectio 1: Itroductio ad Examples This first chapter is largely a review of topics you probably saw i your liear algebra course. So why cover it? (1) Not everyoe remembers everythig

More information

Lecture 7: Properties of Random Samples

Lecture 7: Properties of Random Samples Lecture 7: Properties of Radom Samples 1 Cotiued From Last Class Theorem 1.1. Let X 1, X,...X be a radom sample from a populatio with mea µ ad variace σ

More information

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15 17. Joit distributios of extreme order statistics Lehma 5.1; Ferguso 15 I Example 10., we derived the asymptotic distributio of the maximum from a radom sample from a uiform distributio. We did this usig

More information

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise) Lecture 22: Review for Exam 2 Basic Model Assumptios (without Gaussia Noise) We model oe cotiuous respose variable Y, as a liear fuctio of p umerical predictors, plus oise: Y = β 0 + β X +... β p X p +

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Chapter 9 Maximum Likelihood Estimatio 9.1 The Likelihood Fuctio The maximum likelihood estimator is the most widely used estimatio method. This chapter discusses the most importat cocepts behid maximum

More information

Homework Set #3 - Solutions

Homework Set #3 - Solutions EE 15 - Applicatios of Covex Optimizatio i Sigal Processig ad Commuicatios Dr. Adre Tkaceko JPL Third Term 11-1 Homework Set #3 - Solutios 1. a) Note that x is closer to x tha to x l i the Euclidea orm

More information

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence Chapter 8: Estimatig with Cofidece Sectio 8.2 The Practice of Statistics, 4 th editio For AP* STARNES, YATES, MOORE Chapter 8 Estimatig with Cofidece 8.1 Cofidece Itervals: The Basics 8.2 8.3 Estimatig

More information

x a x a Lecture 2 Series (See Chapter 1 in Boas)

x a x a Lecture 2 Series (See Chapter 1 in Boas) Lecture Series (See Chapter i Boas) A basic ad very powerful (if pedestria, recall we are lazy AD smart) way to solve ay differetial (or itegral) equatio is via a series expasio of the correspodig solutio

More information

Simulation. Two Rule For Inverting A Distribution Function

Simulation. Two Rule For Inverting A Distribution Function Simulatio Two Rule For Ivertig A Distributio Fuctio Rule 1. If F(x) = u is costat o a iterval [x 1, x 2 ), the the uiform value u is mapped oto x 2 through the iversio process. Rule 2. If there is a jump

More information