Probability Density Function Estimation by different Methods

Size: px
Start display at page:

Download "Probability Density Function Estimation by different Methods"

Transcription

1 EEE 739Q SPRIG 00 COURSE ASSIGMET REPORT Probablty Densty Functon Estmaton by dfferent Methods Vas Chandraant Rayar Abstract The am of the assgnment was to estmate the probablty densty functon (PDF of any arbtrary dstrbuton from a set of tranng samples. PDF estmaton was done usng parametrc (Maxmum Lelhood estmaton of a Gaussan model, non-parametrc (Hstogram, Kernel based and K- nearest neghbor and sem-parametrc methods (EM algorthm and gradent based optmzaton. Applcaton of EM algorthm for bnary sequence estmaton has also been dscussed. A I. ITRODUCTIO Bayesan approach towards pattern classfcaton conssts of feature extracton and classfcaton. Feature extracton nvolves the extracton of a lower dmensonal feature vector from the pattern. Once the feature vector s extracted the pattern can be classfed based on Bayes decson rule. Consder a C class problem. Let x be a feature vector extracted from the gven nput pattern. The decson rule can be stated as Decde C f p( C / x > p( C / x ( The posteror probablty can be calculated usng the Bayes theorem as follows p( C / x p( x / C p( C / p( x (.So the mportant part s the evaluaton of the class condtonal densty p x / C for all the C classes. Ths s the tranng ( phase where we have a set of feature vectors also called χ x x... belongng to class tranng samples { }, C and we estmate p x / C ( x gven the tranng samples. Ths has to be done for all the classes. To ease notaton p x / C s referred as p (x ( wth respect to one class only.. Rest of the dscusson wll be The dfferent methods for PDF estmaton can be classfed as Parametrc, on-parametrc and Sem parametrc. In parametrc method the PDF s assumed to be of a standard form (generally Gaussan, Ralegh or unform. The parameters of the assumed PDF can be estmated ether usng ML estmaton or Bayesan Ths report was wrtten for EEE 739Q SPRIG 00 as a part of the course proect. The author s a graduate student at Department of Electrcal Engneerng, Unversty of Maryland, and College Par MD 4074 USA. Estmaton. The non parametrc methods nclude hstogram based, the ernel based methods and the K nearest neghbor methods. In sem parametrc methods the gven densty can be modeled as a combnaton of nown denstes. The parameters can be estmated ether usng gradent descent or Expectaton Maxmzaton (EM algorthm. Secton II dscusses the example used to compare the varous PDF estmaton technques and also the performance measure used. Secton III, IV, V dscusses the parametrc, non parametrc and sem parametrc technques respectvely. Secton VI concludes. Secton VII dscusses the applcaton of EM algorthm for bnary sequence estmaton. II. PROGRAM DETAILS A dmensonal feature vector was used n the program. Fgure show the orgnal densty functon used. The brghtness of the pxel corresponds to the densty value at that pont. In our case the densty functon s unform n the whte regon. We would le to estmate the densty from a set of tranng samples drawn from t. The tranng samples were drawn from a unform dstrbuton over the entre range of the mage and the sample was retaned f t belonged to the whte regon or else dscarded. In ths way random tranng samples were drawn Fgure Plot of orgnal pdf s used A GUI was wrtten n MATLAB 6. to estmate the PDF from these samples usng dfferent methods. Fgure shows a snapshot of the GUI. Once the PDF was estmated the method was evaluated usng the Kullbac-Lebler dstance. The performance was evaluated as follows. Frst we draw M samples from the mage called as x test. The PDF s evaluated at

2 EEE 739Q SPRIG 00 COURSE ASSIGMET REPORT each of the M ponts p eval (x. Let p(x PDF then the Kullbac-Lebler dstance. D s defned as be the orgnal D p( x ln( p( x / p ( x (3 x test Although D does not satsfy the trangle nequalty and s therefore not a true metrc, t satsfes many mportant mathematcal propertes. For example, t s a convex functon of p eval (x, s always nonnegatve, and equals zero only f p eval (x p (x eval.for teratve algorthms, D was plotted as a functon of the teraton number. Whenever p eval (x was zero D was evaluated by settng p eval (x to a very small value. Also t does not mae sense to use ths measure to compare dfferent methods as we are choosng the test ponts only where the orgnal PDF s not zero. Le n ML estmaton we may get a good estmate n the regon where the orgnal PDF s not zero however where the PDF goes zero the estmate s very bad (though we are not consderng regons where the orgnal PDF s not zero. However ths measure wll be useful to study the effect of changng the parameters of a gven method. samples { x x... } χ randomly drawn the mean and, x the covarance matrx are gven by ML estmaton as µ ˆ Σ ˆ x ( x µ ˆ T ( x µ ˆ Where µˆ and Σˆ are the estmated mean vector and covarance matrx respectvely. µˆ s an unbased consstent estmate of the mean vector. Σˆ s dvded by - and not n order to mae the covarance matrx unbased estmate. Also the covarance matrx estmate s consstent. Fgure 3 shows the plot of the orgnal and the estmated PDF for 500. It can be seen that the PDF s not the same as the orgnal PDF except n the mean and covarance sense. Ths s because our basc assumpton of modelng the dstrbuton as a sngle bvarate Gaussan s not suffcent. (5 Fgure3 Orgnal and Estmated PDF usng ML estmaton Fgure A snapshot of the GUI Fgure 4 shows the Kullbac-Lebler dstance as a functon of for 500 test ponts(.e. M500. So ncreasng beyond 300 does not help much as our model s essentally flawed. III. PARAMETRIC ESTIMATIO In parametrc estmaton the the PDF s assumed to have a nown ds trbuton. In our case a standard bvarate Gaussan was used. The standard multvarate Gaussan has the followng form T / ( x µ Σ ( x µ p ( x e (4 d / / ( p Σ For the bvarate case d, x s D vector µ s the mean vector and Σ s the x covarance matrx. The parameters µ and Σ can be estmated ether usng Bayesan estmaton or Maxmum Lelhood (ML estmaton. Usng the tranng

3 EEE 739Q SPRIG 00 COURSE ASSIGMET REPORT 3 Fgure 4 Plot of D Vs for ML estmaton for M500.Hstogram IV. O PARAMETRIC METHODS In ths approach the entre mage s dvded nto a small number of bns and usng the tranng samples χ { x x... }, the PDF s calculated as a hstogram. Ths s a very drect and smple approach and once the PDF s estmated the tranng data can be dscarded. The dsadvantage s that we may lose some nformaton and also t s computatonally expensve n hgher dmensons. The bn wdth M has to be chosen optmally. If M s too large we get a spy PDF or f M s too small there wll be sgnfcant loss n structure. Fgure 5 shows D as a functon of bn sze for dfferent. Usng ths to decde the bn sze does not mae sense as we are evaluatng only where the orgnal PDF s not zero. For our case as we ncrease the bn sze snce our orgnal dstrbuton s unform as bn sze ncreases D decreases whch may not be the case for any general PDF. The only thng we can conclude s that as ncreases we get better estmates. Fgure 6 shows the estmated PDF for M4, 000. Fgure 5 bn sze vs. Kullbac-Lebler dstance for dfferent for a Hstogram based PDF estmator (500 test ponts x Fgure6. Hstogram estmaton for bn sze 0 and 000.Prncpled approach A more prncpled verson of the hstogram can be formulated χ x x... let K samples x.gven tranng samples { }, le nsde a regon R of volume V. Then the PDF at any pont nsde the regon R s gven by p ( x K / V. Kernel based methods fx V and fnd K. K nearest neghbor method fxes K and fnds V. The advantage s that these methods do not have the Curse of dmensonalty. However we need to eep all data to evaluate the PDF.. Kernel based methods In ths method we fx the volume of the regon R as V and vary K the estmated PDF at any pont x s gven by p( x x x n H ( n h d ˆ (6 h where H(x s the ernel functon. In ths case H(x s a hypercube of length h centered at x defned as x x H( h n f x falls nsde the hypercube centered at x and heght h, 0 otherwse. The hypercube s bascally a n dscontnuous ernel. Instead of the hypercube we can chose a Gaussan ernel. The varance s of the Gaussan ernel and h the heght of the hypercube are the smoothng parameters. The smoothng parameters are to be optmally chosen. If the smoothng parameter s too low then the PDF s very patchy and has to be very large to get a good estmate of the PDF. If the smoothng parameter s very large then the PDF spreads out. Fgure 7 shows the Kullbac-Lebler dstance for square ernel as a functon of h for dfferent for M500 test ponts. As can be seen from the plot ntally D decreases up tll a certan pont reaches a mnmum and then agan ncreases. The part where D decreases (.e. better estmate s when the squares have enough wdth to overlap and gve a better estmate. From the plot t can be seen that for 600 the optmal value of s around 4 to 6. Fgure 8 shows the estmated PDF and the orgnal PDF for 600 h6 for a rectangular ernel. n Fgure 9 shows the Kullbac-Lebler dstance for the Gaussan ernel as a functon of the varance s for dfferent for 800 test ponts. As can be seen from the plot that D decreases as the smoothng parameter ncreases tll a certan pont and after that t ncreases agan. From the plot t can be seen that the optmal value for s s. Also as ncreases the curves shft downwards whch s straghtforward that as the number of tranng samples ncreases we get a better estmate of the PDF.. Fgure 0 shows the estmated PDF for 500 and s for the Gaussan ernel.

4 EEE 739Q SPRIG 00 COURSE ASSIGMET REPORT 4 Fgure 7 Plot of heght vs. Kullbac-Lebler dstance for dfferent for a rectangular ernel based PDF estmator based on 500 test ponts Fgure 0 Estmated PDF usng Gaussan ernel based method for 500 and sgma. K earest eghbor In ths method V s fxed and K s vared. Essentally we need to search the K nearest neghbors. In ths case K has to be optmally chosen for a good estmate of the PDF. Fgure shows the Kullbac-Lebler dstance as a functon of K for dfferent and M00 test ponts. Fgure shows the estmated PDF for 300 and K. Fgure 8 Estmated PDF usng rectangular ernel based method for 600 and h6 Fgure Plot of K vs. Kullbac-Lebler dstance for dfferent for a K based PDF estmator based on 500 test ponts Fgure 9 Plot of sgma vs. Kullbac-Lebler dstance for dfferent for a Gaussan ernel based PDF estmator based on 800 test ponts

5 EEE 739Q SPRIG 00 COURSE ASSIGMET REPORT 5 Fgure 4 shows the Kullbac-Lebler dstance for M0(component denstes and 500 test ponts for dfferent as a functon of the teraton number. Increasng has no effect on the speed of convergence however as ncreases the Kullbac-Lebler dstance decreases. Fgure Estmated PDF usng K based method for 300 and K V. SEMI PARAMETRIC METHODS These methods combne the flexblty of nonparametrc methods and the effcency n evaluaton of parametrc methods. Here we model the PDF as a mxture of parametrc PDF. The parameters have to be estmated ether by some optmzaton technque le gradent descent or Expectaton Maxmzaton Algorthm..EM Algorthm The EM algorthm convergence propertes are studed as a functon of the number of teratons Fgure 3 shows the Kullbac-Lebler dstance for 500 tranng samples and 500 test ponts for dfferent M(number of mxture components as a functon of the teraton number. It can be seen that the EM algorthm converges n three to sx teratons. Also D decreases as M ncreases. Fgure 4 Kullbac-Lebler dstance for M0 and 500 test ponts for dfferent as a functon of the teraton number Fgure 5 shows the log lelhood for 500 and 500 test ponts as a functon of the teraton number for dfferent M. The log lelhood ncreases from the pont where t starts to converge. Fgure 5 log lelhood for 500 and 500 test ponts as a functon of the teraton number for dfferent M. Fgure 3 Kullbac-Lebler dstance for 500 and 500 test ponts for dfferent M (number of mxture components as a functon of the teraton number. The EM algorthm also depends on the ntalzaton strateges whch n turn affect the number of teratons requred to converge. If the ntal ponts are wthn the unform regon of the PDF them the EM algorthm wll converge very fast. Mostly t was observed that the EM algorthm nvarant to ntalzaton strateges converged n 5 to 0 steps. Fgure 6 shows the ntal and the fnal poston of the Gaussans.

6 EEE 739Q SPRIG 00 COURSE ASSIGMET REPORT 6 the mxng parameters. The descent rate for each of the parameters s got after tral and error approach. In ths case alpha for mean was used as 0.9 for sgma 0.3 and 0. for the mxng parameters. Fgure 7 shows Kullbac-Lebler dstance for 500 and 500 test ponts for dfferent M (number of mxture components as a functon of the teraton number. As can be seen from the plot the PDF converges after around 5 teratons. The convergence s very slow as compared to the EM algorthm. Also convergence s very senstve to the descent rate. The descent rate for mean, varance and mxng parameters were chosen by tral and error. By properly choosng the descent rate I guess we can get a faster convergence. Fgure 7 Kullbac-Lebler dstance for 500 and 500 test ponts for dfferent M (number of mxture components as a functon of the teraton number Fgure 8 shows the Kullbac-Lebler dstance for M0 and 500 test ponts for dfferent as a functon of the teraton number Fgure 6 ntal Gaussans and the fnal postons of the Gaussans after 5 teratons and the Estmated PDF for M0 components 500. Gradent Descent Optmzaton The negatve log lelhood functon can be mnmzed by gradent descent method. The mnmzaton s wth respect to the parameters the mean, varance of the ntal Gaussans and Fgure 8 Kullbac-Lebler dstance for M0 and 500 test ponts for dfferent as a functon of the teraton number.

7 EEE 739Q SPRIG 00 COURSE ASSIGMET REPORT 7 sequence s scaled by a fxed unnown non zero scalar c. t s then corrupted by addtve whte Gaussan nose. c Z B Y Fgure. A smple channel wth scalng and nose added. Fgure 9 shows the log lelhood for 500 and 500 test ponts as a functon of the teraton number for dfferent M. The log lelhood ncreases from the pont where t starts to converge. So we have y b z Y c B+ Z where Y. B. Z. y b z Each of the z are..d. Gaussan random wth zero mean and varance s.e. z ~ ( ;0, σ z. The problem s to get an ML estmate of B. ote that c s unnown. The ML estmaton problem can be formulated as follows. / b ( y;0, σ ( y ; c, σ p y / b ( y So the ML estmator s f p y when b when b 0 / b( y / b py / b( y / b 0 b else b 0 Smplfyng we get the followng. f y c / b else b 0 Here the value of c s unnown even though f t s a fxed quantty. We can use the EM algorthm by defnng a new completes data X(Y,C. The E step gves estmates of C whch can be used n the M step. Fgure 0 Estmated PDF for M0 components 500 ote the EM algorthm gves a better PDF than gradent descent for the same number of components. VI. COCLUSIO Use ernel based method for mnmzng the computatonal requrements (but we have to eep the data and use EM algorthm. For mnmzng both memory and computatonal requrements. VII. EXTRA CREDIT II The followng secton dscusses the applcaton of the EM algorthm for bnary sequence estmaton. Consder a system shown n the Fgure. B s a bnary sequence of length. B[b,b,..b ] where each b could be a one or zero. A typcal realzaton for 5 could be [ ]. The bnary E STEP:Let D be some estmate of B Q ( B / p p Y, C / B c s not Y / C, B D Smplfyn Q ( B / Q ( B / D D a ( y, c / ( y / c, B random g E B, [ log p ( y, c / B / Y, D ] E [ p ( y Y, C / B Y / C, B ( y var ( y ( y / c, B able cb ; cb E [ c / Y, D ] b, σ / Y, D ] D 0 provdes no nformaton about c. Let D the subset of D whch are and let Y be the correspondng Y. So the E step can be summarzed as follows.

8 EEE 739Q SPRIG 00 COURSE ASSIGMET REPORT 8 let E[ C / Y, D] E[ Y ] a Y s the subset of Y values assscated wth the current estmates of B whch are s. Q( B / D ( y ab M STEP: Fnd B to maxmze ths. We can maxmze each ndvdual term. Consder a typcal term we have ( y a y f new b So the M step s, For each b, y belongng to B and Y, set b new old f ( a > 0 and old y otherwse set b > a / or ( a < 0 and new 0 y < a / ALGORITHM:.Intalze B old [.].E step: ae[y ] where Y s the subset of Y assocated wth the current estmates of B whch are s. 3.M step : For each b, y belongng to B old and Y, f ( a > 0 and set b new y otherwse set b 4.Iterate tll convergence. > a / or ( a < 0 and new 0 y < a / SIMULATIO: Smulaton was done for the case for 000. c3. The algorthm converged n about to 3 teratons. Convergence was decded when there was no further mprovement n the value of estmated c. Fgure shows the error n the estmaton of B as a functon of teraton number for dfferent s. It can be seen that t converges n about to 3 teratons. Fgure 3. shows the estmated value of c as a functon of teraton number for s. Fgure 3. Estmated value of c for sgma as a functon of teraton number REFERECES [] Ghahraman, Z & Jordan, MI (994. Supervsed learnng from ncomplete data va an EM approach. In JD Cowan, G Tesauro and J Alspector, edtors, Advances n eural Informaton Processng Systems 6. San Mateo, CA: Morgan Kaufmann, 0-7. ( [] Blmes, J (998 A Gentle Tutoral of the EM Algorthm, UC- Bereley TR ( [3] C.. Georghades and J.C. Han, Sequence Estmaton n the Presence of Random Parameters Va the EM Algorthm, IEEE Transactons on Communcatons, vol. 45, pp , March 997 Vas Chandraant Rayar s a graduate student at the Unversty of Maryland College Par, MD 4074 USA (telephone: , e- mal: vas@umacs.umd.edu. Fgure. Error n the estmaton of B as a functon of teraton number for dfferent s.

Classification as a Regression Problem

Classification as a Regression Problem Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class

More information

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of

More information

Lecture 12: Classification

Lecture 12: Classification Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.

More information

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

VQ widely used in coding speech, image, and video

VQ widely used in coding speech, image, and video at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Expectation Maximization Mixture Models HMMs

Expectation Maximization Mixture Models HMMs -755 Machne Learnng for Sgnal Processng Mture Models HMMs Class 9. 2 Sep 200 Learnng Dstrbutons for Data Problem: Gven a collecton of eamples from some data, estmate ts dstrbuton Basc deas of Mamum Lelhood

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Gaussian Mixture Models

Gaussian Mixture Models Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous

More information

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010 Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton

More information

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

More metrics on cartesian products

More metrics on cartesian products More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of

More information

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before

More information

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1 On an Extenson of Stochastc Approxmaton EM Algorthm for Incomplete Data Problems Vahd Tadayon Abstract: The Stochastc Approxmaton EM (SAEM algorthm, a varant stochastc approxmaton of EM, s a versatle tool

More information

Conjugacy and the Exponential Family

Conjugacy and the Exponential Family CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the

More information

The big picture. Outline

The big picture. Outline The bg pcture Vncent Claveau IRISA - CNRS, sldes from E. Kjak INSA Rennes Notatons classes: C = {ω = 1,.., C} tranng set S of sze m, composed of m ponts (x, ω ) per class ω representaton space: R d (=

More information

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton Memoral Unversty of ewfoundland Pattern Recognton Lecture 7 May 3, 006 http://wwwengrmunca/~charlesr Offce Hours: uesdays hursdays 8:30-9:30

More information

Markov Chain Monte Carlo Lecture 6

Markov Chain Monte Carlo Lecture 6 where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways

More information

On mutual information estimation for mixed-pair random variables

On mutual information estimation for mixed-pair random variables On mutual nformaton estmaton for mxed-par random varables November 3, 218 Aleksandr Beknazaryan, Xn Dang and Haln Sang 1 Department of Mathematcs, The Unversty of Msssspp, Unversty, MS 38677, USA. E-mal:

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Chapter Newton s Method

Chapter Newton s Method Chapter 9. Newton s Method After readng ths chapter, you should be able to:. Understand how Newton s method s dfferent from the Golden Secton Search method. Understand how Newton s method works 3. Solve

More information

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve

More information

The Basic Idea of EM

The Basic Idea of EM The Basc Idea of EM Janxn Wu LAMDA Group Natonal Key Lab for Novel Software Technology Nanjng Unversty, Chna wujx2001@gmal.com June 7, 2017 Contents 1 Introducton 1 2 GMM: A workng example 2 2.1 Gaussan

More information

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach A Bayes Algorthm for the Multtask Pattern Recognton Problem Drect Approach Edward Puchala Wroclaw Unversty of Technology, Char of Systems and Computer etworks, Wybrzeze Wyspanskego 7, 50-370 Wroclaw, Poland

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information

Clustering & Unsupervised Learning

Clustering & Unsupervised Learning Clusterng & Unsupervsed Learnng Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 2012 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

Hidden Markov Models

Hidden Markov Models Hdden Markov Models Namrata Vaswan, Iowa State Unversty Aprl 24, 204 Hdden Markov Model Defntons and Examples Defntons:. A hdden Markov model (HMM) refers to a set of hdden states X 0, X,..., X t,...,

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

SDMML HT MSc Problem Sheet 4

SDMML HT MSc Problem Sheet 4 SDMML HT 06 - MSc Problem Sheet 4. The recever operatng characterstc ROC curve plots the senstvty aganst the specfcty of a bnary classfer as the threshold for dscrmnaton s vared. Let the data space be

More information

A linear imaging system with white additive Gaussian noise on the observed data is modeled as follows:

A linear imaging system with white additive Gaussian noise on the observed data is modeled as follows: Supplementary Note Mathematcal bacground A lnear magng system wth whte addtve Gaussan nose on the observed data s modeled as follows: X = R ϕ V + G, () where X R are the expermental, two-dmensonal proecton

More information

Differentiating Gaussian Processes

Differentiating Gaussian Processes Dfferentatng Gaussan Processes Andrew McHutchon Aprl 17, 013 1 Frst Order Dervatve of the Posteror Mean The posteror mean of a GP s gven by, f = x, X KX, X 1 y x, X α 1 Only the x, X term depends on the

More information

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression 11 MACHINE APPLIED MACHINE LEARNING LEARNING MACHINE LEARNING Gaussan Mture Regresson 22 MACHINE APPLIED MACHINE LEARNING LEARNING Bref summary of last week s lecture 33 MACHINE APPLIED MACHINE LEARNING

More information

Natural Language Processing and Information Retrieval

Natural Language Processing and Information Retrieval Natural Language Processng and Informaton Retreval Support Vector Machnes Alessandro Moschtt Department of nformaton and communcaton technology Unversty of Trento Emal: moschtt@ds.untn.t Summary Support

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

Clustering & (Ken Kreutz-Delgado) UCSD

Clustering & (Ken Kreutz-Delgado) UCSD Clusterng & Unsupervsed Learnng Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y ), fnd an approxmatng

More information

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z ) C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z

More information

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran

More information

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering / Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

Clustering with Gaussian Mixtures

Clustering with Gaussian Mixtures Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your

More information

Composite Hypotheses testing

Composite Hypotheses testing Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter

More information

Statistical pattern recognition

Statistical pattern recognition Statstcal pattern recognton Bayes theorem Problem: decdng f a patent has a partcular condton based on a partcular test However, the test s mperfect Someone wth the condton may go undetected (false negatve

More information

Estimation: Part 2. Chapter GREG estimation

Estimation: Part 2. Chapter GREG estimation Chapter 9 Estmaton: Part 2 9. GREG estmaton In Chapter 8, we have seen that the regresson estmator s an effcent estmator when there s a lnear relatonshp between y and x. In ths chapter, we generalzed the

More information

3.1 ML and Empirical Distribution

3.1 ML and Empirical Distribution 67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum

More information

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics /7/7 CSE 73: Artfcal Intellgence Bayesan - Learnng Deter Fox Sldes adapted from Dan Weld, Jack Breese, Dan Klen, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer What s Beng Learned? Space

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017 U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that

More information

A Robust Method for Calculating the Correlation Coefficient

A Robust Method for Calculating the Correlation Coefficient A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal

More information

Appendix B: Resampling Algorithms

Appendix B: Resampling Algorithms 407 Appendx B: Resamplng Algorthms A common problem of all partcle flters s the degeneracy of weghts, whch conssts of the unbounded ncrease of the varance of the mportance weghts ω [ ] of the partcles

More information

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2) 1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons

More information

Learning Theory: Lecture Notes

Learning Theory: Lecture Notes Learnng Theory: Lecture Notes Lecturer: Kamalka Chaudhur Scrbe: Qush Wang October 27, 2012 1 The Agnostc PAC Model Recall that one of the constrants of the PAC model s that the data dstrbuton has to be

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise. Chapter - The Smple Lnear Regresson Model The lnear regresson equaton s: where y + = β + β e for =,..., y and are observable varables e s a random error How can an estmaton rule be constructed for the

More information

Some modelling aspects for the Matlab implementation of MMA

Some modelling aspects for the Matlab implementation of MMA Some modellng aspects for the Matlab mplementaton of MMA Krster Svanberg krlle@math.kth.se Optmzaton and Systems Theory Department of Mathematcs KTH, SE 10044 Stockholm September 2004 1. Consdered optmzaton

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 89 Fall 206 Introducton to Machne Learnng Fnal Do not open the exam before you are nstructed to do so The exam s closed book, closed notes except your one-page cheat sheet Usage of electronc devces

More information

EM and Structure Learning

EM and Structure Learning EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

Uncertainty as the Overlap of Alternate Conditional Distributions

Uncertainty as the Overlap of Alternate Conditional Distributions Uncertanty as the Overlap of Alternate Condtonal Dstrbutons Olena Babak and Clayton V. Deutsch Centre for Computatonal Geostatstcs Department of Cvl & Envronmental Engneerng Unversty of Alberta An mportant

More information

Maximal Margin Classifier

Maximal Margin Classifier CS81B/Stat41B: Advanced Topcs n Learnng & Decson Makng Mamal Margn Classfer Lecturer: Mchael Jordan Scrbes: Jana van Greunen Corrected verson - /1/004 1 References/Recommended Readng 1.1 Webstes www.kernel-machnes.org

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

The Geometry of Logit and Probit

The Geometry of Logit and Probit The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.

More information

Error Probability for M Signals

Error Probability for M Signals Chapter 3 rror Probablty for M Sgnals In ths chapter we dscuss the error probablty n decdng whch of M sgnals was transmtted over an arbtrary channel. We assume the sgnals are represented by a set of orthonormal

More information

Natural Images, Gaussian Mixtures and Dead Leaves Supplementary Material

Natural Images, Gaussian Mixtures and Dead Leaves Supplementary Material Natural Images, Gaussan Mxtures and Dead Leaves Supplementary Materal Danel Zoran Interdscplnary Center for Neural Computaton Hebrew Unversty of Jerusalem Israel http://www.cs.huj.ac.l/ danez Yar Wess

More information

A New Scrambling Evaluation Scheme based on Spatial Distribution Entropy and Centroid Difference of Bit-plane

A New Scrambling Evaluation Scheme based on Spatial Distribution Entropy and Centroid Difference of Bit-plane A New Scramblng Evaluaton Scheme based on Spatal Dstrbuton Entropy and Centrod Dfference of Bt-plane Lang Zhao *, Avshek Adhkar Kouch Sakura * * Graduate School of Informaton Scence and Electrcal Engneerng,

More information

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest

More information

Unified Subspace Analysis for Face Recognition

Unified Subspace Analysis for Face Recognition Unfed Subspace Analyss for Face Recognton Xaogang Wang and Xaoou Tang Department of Informaton Engneerng The Chnese Unversty of Hong Kong Shatn, Hong Kong {xgwang, xtang}@e.cuhk.edu.hk Abstract PCA, LDA

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ CSE 455/555 Sprng 2013 Homework 7: Parametrc Technques Jason J. Corso Computer Scence and Engneerng SUY at Buffalo jcorso@buffalo.edu Solutons by Yngbo Zhou Ths assgnment does not need to be submtted and

More information

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF 10-708: Probablstc Graphcal Models 10-708, Sprng 2014 8 : Learnng n Fully Observed Markov Networks Lecturer: Erc P. Xng Scrbes: Meng Song, L Zhou 1 Why We Need to Learn Undrected Graphcal Models In the

More information

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one) Why Bayesan? 3. Bayes and Normal Models Alex M. Martnez alex@ece.osu.edu Handouts Handoutsfor forece ECE874 874Sp Sp007 If all our research (n PR was to dsappear and you could only save one theory, whch

More information

Lecture 4: November 17, Part 1 Single Buffer Management

Lecture 4: November 17, Part 1 Single Buffer Management Lecturer: Ad Rosén Algorthms for the anagement of Networs Fall 2003-2004 Lecture 4: November 7, 2003 Scrbe: Guy Grebla Part Sngle Buffer anagement In the prevous lecture we taled about the Combned Input

More information

Lossy Compression. Compromise accuracy of reconstruction for increased compression.

Lossy Compression. Compromise accuracy of reconstruction for increased compression. Lossy Compresson Compromse accuracy of reconstructon for ncreased compresson. The reconstructon s usually vsbly ndstngushable from the orgnal mage. Typcally, one can get up to 0:1 compresson wth almost

More information

1 Motivation and Introduction

1 Motivation and Introduction Instructor: Dr. Volkan Cevher EXPECTATION PROPAGATION September 30, 2008 Rce Unversty STAT 63 / ELEC 633: Graphcal Models Scrbes: Ahmad Beram Andrew Waters Matthew Nokleby Index terms: Approxmate nference,

More information

Machine learning: Density estimation

Machine learning: Density estimation CS 70 Foundatons of AI Lecture 3 Machne learnng: ensty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square ata: ensty estmaton {.. n} x a vector of attrbute values Objectve: estmate the model of

More information

6 Supplementary Materials

6 Supplementary Materials 6 Supplementar Materals 61 Proof of Theorem 31 Proof Let m Xt z 1:T : l m Xt X,z 1:t Wethenhave mxt z1:t ˆm HX Xt z 1:T mxt z1:t m HX Xt z 1:T + mxt z 1:T HX We consder each of the two terms n equaton

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

DETERMINATION OF UNCERTAINTY ASSOCIATED WITH QUANTIZATION ERRORS USING THE BAYESIAN APPROACH

DETERMINATION OF UNCERTAINTY ASSOCIATED WITH QUANTIZATION ERRORS USING THE BAYESIAN APPROACH Proceedngs, XVII IMEKO World Congress, June 7, 3, Dubrovn, Croata Proceedngs, XVII IMEKO World Congress, June 7, 3, Dubrovn, Croata TC XVII IMEKO World Congress Metrology n the 3rd Mllennum June 7, 3,

More information

Chat eld, C. and A.J.Collins, Introduction to multivariate analysis. Chapman & Hall, 1980

Chat eld, C. and A.J.Collins, Introduction to multivariate analysis. Chapman & Hall, 1980 MT07: Multvarate Statstcal Methods Mke Tso: emal mke.tso@manchester.ac.uk Webpage for notes: http://www.maths.manchester.ac.uk/~mkt/new_teachng.htm. Introducton to multvarate data. Books Chat eld, C. and

More information

4DVAR, according to the name, is a four-dimensional variational method.

4DVAR, according to the name, is a four-dimensional variational method. 4D-Varatonal Data Assmlaton (4D-Var) 4DVAR, accordng to the name, s a four-dmensonal varatonal method. 4D-Var s actually a drect generalzaton of 3D-Var to handle observatons that are dstrbuted n tme. The

More information

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models

More information

Lecture 10 Support Vector Machines. Oct

Lecture 10 Support Vector Machines. Oct Lecture 10 Support Vector Machnes Oct - 20-2008 Lnear Separators Whch of the lnear separators s optmal? Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron

More information

Lecture 3: Probability Distributions

Lecture 3: Probability Distributions Lecture 3: Probablty Dstrbutons Random Varables Let us begn by defnng a sample space as a set of outcomes from an experment. We denote ths by S. A random varable s a functon whch maps outcomes nto the

More information

The conjugate prior to a Bernoulli is. A) Bernoulli B) Gaussian C) Beta D) none of the above

The conjugate prior to a Bernoulli is. A) Bernoulli B) Gaussian C) Beta D) none of the above The conjugate pror to a Bernoull s A) Bernoull B) Gaussan C) Beta D) none of the above The conjugate pror to a Gaussan s A) Bernoull B) Gaussan C) Beta D) none of the above MAP estmates A) argmax θ p(θ

More information

Lecture 14 (03/27/18). Channels. Decoding. Preview of the Capacity Theorem.

Lecture 14 (03/27/18). Channels. Decoding. Preview of the Capacity Theorem. Lecture 14 (03/27/18). Channels. Decodng. Prevew of the Capacty Theorem. A. Barg The concept of a communcaton channel n nformaton theory s an abstracton for transmttng dgtal (and analog) nformaton from

More information

Which Separator? Spring 1

Which Separator? Spring 1 Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal

More information

Linear Regression Analysis: Terminology and Notation

Linear Regression Analysis: Terminology and Notation ECON 35* -- Secton : Basc Concepts of Regresson Analyss (Page ) Lnear Regresson Analyss: Termnology and Notaton Consder the generc verson of the smple (two-varable) lnear regresson model. It s represented

More information

e i is a random error

e i is a random error Chapter - The Smple Lnear Regresson Model The lnear regresson equaton s: where + β + β e for,..., and are observable varables e s a random error How can an estmaton rule be constructed for the unknown

More information

Clustering gene expression data & the EM algorithm

Clustering gene expression data & the EM algorithm CG, Fall 2011-12 Clusterng gene expresson data & the EM algorthm CG 08 Ron Shamr 1 How Gene Expresson Data Looks Entres of the Raw Data matrx: Rato values Absolute values Row = gene s expresson pattern

More information

Cluster Validation Determining Number of Clusters. Umut ORHAN, PhD.

Cluster Validation Determining Number of Clusters. Umut ORHAN, PhD. Cluster Analyss Cluster Valdaton Determnng Number of Clusters 1 Cluster Valdaton The procedure of evaluatng the results of a clusterng algorthm s known under the term cluster valdty. How do we evaluate

More information

18.1 Introduction and Recap

18.1 Introduction and Recap CS787: Advanced Algorthms Scrbe: Pryananda Shenoy and Shjn Kong Lecturer: Shuch Chawla Topc: Streamng Algorthmscontnued) Date: 0/26/2007 We contnue talng about streamng algorthms n ths lecture, ncludng

More information

Maximizing the number of nonnegative subsets

Maximizing the number of nonnegative subsets Maxmzng the number of nonnegatve subsets Noga Alon Hao Huang December 1, 213 Abstract Gven a set of n real numbers, f the sum of elements of every subset of sze larger than k s negatve, what s the maxmum

More information

14 Lagrange Multipliers

14 Lagrange Multipliers Lagrange Multplers 14 Lagrange Multplers The Method of Lagrange Multplers s a powerful technque for constraned optmzaton. Whle t has applcatons far beyond machne learnng t was orgnally developed to solve

More information