9 : Learning Partially Observed GM : EM Algorithm

Size: px
Start display at page:

Download "9 : Learning Partially Observed GM : EM Algorithm"

Transcription

1 10-708: Probablstc Graphcal Models , Sprng : Learnng Partally Observed GM : EM Algorthm Lecturer: Erc P. Xng Scrbes: Mrnmaya Sachan, Phan Gadde, Vswanathan Srpradha 1 Introducton So far n the course, we have studed how graphcal models brng the worlds of graph theory and statstcs together, provdng an appealng medum by whch we can model a large sets of varables and ndependences va a data structure that lends tself naturally to the desgn of effcent general-purpose algorthms. We have studed about representaton and nference. Gven the graphcal model, nference tells us how can we use these models to effcently answer probablstc queres. In the prevous lecture, we saw another mportant topc n graphcal models,.e. learnng. Learnng s used to specfy two thngs that descrbe a GM: the graph topology (structure) and the parameters of each local dstrbuton. We also saw a technque for learnng parameters n graphcal models when all the varables n the GM are fully observed called Maxmum Lelhood Estmaton. We assume that the goal of learnng n ths case s to fnd the values of the parameters of each dstrbuton whch maxmze the lelhood of the tranng data. We saw that the log-lelhood functon decomposes accordng to the structure of the graph, and hence we can maxmze the contrbuton to the log-lelhood of each node ndependently (assumng the parameters n each node are ndependent of the other nodes). However, often we may want to model some extra latent varables that we care about and we mght want to learn them from data. The latent varables could be magnary quanttes meant to mae the analyss of the data easer (e.g. n HMM) or they may not exst, but be hard or mpossble to measure. Dscrete latent varables are often used to partton/cluster data nto sub groups and contnuous latent varables (factors) are used for dmensonalty reducton. In ths lecture, we descrbe the expectaton maxmzaton algorthm (EM) whch can be used to learn model parameters n the partally observed settng. 2 Mxture Models We motvate the EM algorthm through the well-nown mxture model example. Consder the problem of clusterng a set of N observed data ponts x 1... x N nto K clusters. One way to approach the clusterng problem s to estmate the margnal densty P (X ). However, ths can n general be hard as P (X ) can be multmodal. However, a mxture model s a way of constranng the form of P (X ) so that the problem becomes tractable and yet have suffcent power to model real world data. A typcal fnte-dmensonal mxture model (shown n Fgure 1) conssts of: N random varables correspondng to the observatons, each assumed to be dstrbuted accordng to a mxture of K components, wth each component belongng to the same parametrc famly of dstrbutons but wth dfferent parameters. 1

2 2 9 : Learnng Partally Observed GM : EM Algorthm Fgure 1: A Graphcal Model representaton of the Gaussan Mxture Model N latent varables specfyng the dentty of the mxture component for each observaton, each dstrbuted accordng to a K-dmensonal categorcal dstrbuton. A set of K multnomally dstrbuted mxture weghts. K sets of parameters, each set specfes the parameters of the correspondng mxture component. The graphcal model allows the factorzaton of the margnal densty as: P (X ) = Z P (X Z )P (Z ) Ths maes the problem tractable as each condtonal dstrbuton P (X Z ) can now be assumed to be unmodal. In the specfc case of a mxture of gaussans, margnal densty s factorzed as: P (X µ, Σ) = π N(X µ, Σ ) In the above equaton, we have:- π - Mxture proporton N(x µ, Σ ) - Gaussan dstrbuton for the mxture component Our tas s to learn the values of π, µ and Σ. Note that f we were worng n a fully observed scenaro, ths would have been relatvely easy. We would have smply decomposed the log lelhood functon and then found the maxmum lelhood estmates for all the parameters. In the fully observed scenaro, the log-lelhood can be wrtten as: l(θ; D) = log = log P (x, z ) P (z π)p(x z, µ, σ ) = = log π z + log N(x µ, σ ) z z log(π ) 1 z (x µ ) T Σ 1 2 (x µ ) + C

3 9 : Learnng Partally Observed GM : EM Algorthm 3 Whle parameters can be easly learned n the completely observed case as the log lelhood decomposes. However, f we do not observe the values of z as n the mxture model settng, all the parameters become coupled together va margnalzaton. l c (θ; D) = log z P (X, z θ) = log z P (z θ z )P (X z, θ x ) Notce the summaton nsde the log. Hence, the log lelhood functon no longer decomposes ncely and hence, t s hard to fnd closed form maxmum lelhood estmates. Ths s why we ntroduce the EM algorthm. 3 K-Means Recall the K-means algorthm. K-means s a clusterng technque that ams to partton n observatons nto K clusters n whch each observaton belongs to the cluster wth the nearest mean resultng n a parttonng of the data space nto Vorono cells. We start wth ntal random guesses for the class centrods. Then we terate between the followng two steps untl convergence (when class centrods stop changng): Usng the guesses for the class centrods, assgn each data pont to the nearest centrod usng some dstance metrc. z (t) = argmn (x µ (t) Usng the new class assgnments, recompute class centrods. )T Σ 1(t) (x µ (t) ) (1) µ (t+1) = δ(z(t), )x δ(z (t), ) (2) Now let s genealze the problem. We would le to thn of each of our clusters as beng assocated wth some dstrbuton e.g. P (x z = ) N(µ, Σ ) and we would le to learn the parameters (µ and Σ ) for each dstrbuton. In ths smple example, we would prove that the soluton descrbed n equaton 1 and 2 s n fact what the EM algorthm for K-means does. 4 EM Algorthm We now ntroduce the EM algorthm. We had seen that the maxmum lelhood estmator cannot be generally used to do nference n general n a partally observed settng. Fndng a maxmum lelhood soluton requres tang dervatves of the lelhood functon wth respect to both the parameters and the latent varables and smultaneously solvng the resultng equatons. In statstcal models wth latent varables, ths usually leads to ntractable solutons. The result s typcally a set of nterlocng equatons n whch the soluton to the parameters requres the values of the latent varables and vce-versa and substtutng one set of equatons nto the other produces an unsolvable system of equatons. The EM algorthm wors around ths by consderng the expected complete log lelhood. The expected complete log lelhood, n the Gaussan mxture model settng can be wrtten as: < z > P (Z X;θ) log(π ) 1 2 < z > P (Z X;θ) ((x µ ) T Σ 1 (x µ ) + log Σ + C) (3)

4 4 9 : Learnng Partally Observed GM : EM Algorthm Here, <... > P (Z X;θ) denotes the expectaton wth respect to P (Z X; θ) The EM algorthm maxmzes < l c (θ; x, z) > P (Z X;θ) by teratng between two steps (called E and M steps). The E step computes the expected value of the suffcent statstcs of the hdden varables (.e., Z) gven the current estmate of the parameters (.e., π and µ) and the M step computes the parameters under the current results of the expected value of the hdden varables. Hence, n the Gaussan mxture model, we compute < z > P (Z X;θ) n the E step and maxmze (3) wth respect to the model parameters π, µ and Σ n the M step to obtan: π t+1 = < n > P (Z X;θ) N µ t+1 = Σ t+1 = τ (t) x τ (t) τ (t) (x µ t+1 )(x µ t+1 τ (t) where τ (t) =< z > P (Z X;θ) and < n > P (Z X;θ) = τ (t). ) T 5 Comparson between K-means and EM Next, we compare the K-means and the EM algorthm for a mxture of gaussans. The EM algorthm for a mxture of gaussans s le a soft verson of the K means algorthm. For K-means, n the E-Step, we assgn ponts to clusters and n the M-step, we recompute the cluster centers assumng each pont belongs to a sngle cluster. In the E-Step of the EM verson, we assgn ponts to clusters n a soft, probablstc manner and n the M-step, we recompute class centers assumng that the ponts are assgned to clusters n a probablstc manner where the weght of each pont s gven by τ (t). 6 Theory underlyng EM Algorthm After the nce ntroducton, we analyze the EM algorthm and the theory underlyng t. We have seen how the EM algorthm can be effectvely used to compute the Maxmum Lelhood (ML) estmate n the presence of unobserved (.e. hdden) data. So, not surprsngly, our nvestgaton began from the lelhood functon. Let X denote the observable varables(s), and Z denote the latent varables(s). Had Z been observed, then we could have defned the complete log lelhood (l c (θ; X, Z) = log ) and proceeded wth ML estmaton. However, n a partally observed case, the lelhood s a random quantty and hence cannot be maxmzed drectly. Wth Z unobserved, we defne the ncomplete log lelhood, whch s the log of a margnal probablty. However, the ncomplete log lelhood s coupled! l c (θ; X) = logp (X θ) = log Z To counter ths, we ntroduce the expected complete log lelhood wth respect to a dstrbuton q(z): < l c (θ; X, Z) > q = z q(z X, θ) log It s easy to see that the expected complete log lelhood s a determnstc functon of θ. It s lnear and nherts the factorzablty of the complete log lelhood. We clam that maxmzng ths surrogate could

5 9 : Learnng Partally Observed GM : EM Algorthm 5 yeld a maxmzer of the lelhood n the partally observed scenaro. Ths s due to a nce relatonshp between the complete lelhood and the expected complete lelhood as derved below: l(θ; X) = log P (X θ) = log Z = log Z Z q(z X) q(z X) q(z X) log q(z X) l(θ; X) < l c (θ; X, Z) > q +H q (4) where, H q s the entropy of random varable Z and does not depend on θ. From the thrd lne to fouth lne, probablty theory verson of the Jensens Inequalty s used. From the dervaton, we see that < l c (θ; X, Z) > q +H q s a lower bound for the orgnal lelhood. Note that H q s constant for a chosen dstrbuton q. Hence, to maxmze the orgnal lelhood, t suffces to maxmze the lower bound (expected complete lelhood). Ths technque s actvely used n Physcs. Next, we descrbe ths beautful connecton. For fxed data X, we defne a functonal called the free energy: F (q, θ) = Z q(z X) log q(z X) l(θ; X) Another way to see the EM algorthm s as a jont maxmzaton procedure that teratvely maxmzes a better and better lower bound F to the true lelhood l(θ; X). Then, the EM algorthm s just a coordnate-ascent method on F gven by the followng EM steps: 1. E step: q t+1 = argmax q F (q, θ t ) 2. M step: θ t+1 = argmax θ F (q t+1, θ t ) Ths jont maxmzaton vew of EM s useful as t leads to many varants of the EM algorthm that use alternatve strateges to maxmze the free energy functon for example, by performng partal maxmzaton n the frst maxmzaton step. Next, we show that ths s exactly the EM algorthm. Frst, we loo at the E-step. Here we want to show that q whch maxmzes F (q, θ t ) s the actually the posteror dstrbuton over latent varables gven the data and the current value of parameters. Ths could be useful for testng (e.g. for classfcaton). In other words: q t+1 = argmax q F (q, θ t ) = P (Z X, θ t ) It can be easly proved that ths settng attans the bound.e. l(θ; x) F (q, ). F (p(z X, θ t ), θ t ) = Z = Z P (Z X, θ t ) log p(x, Z θt ) p(z X, θ t ) q(z X) log p(x θ t ) = log P (X θ t ) = l(θ t ; X) Ths result can be alternatvely shown usng varatonal calculus or the fact that l(θ; X) F (q, θ) = KL(q p(z X, θ)).

6 6 9 : Learnng Partally Observed GM : EM Algorthm Wthout loss of generalty, let us assume that p(x, z θ) s a generalzed exponental famly dstrbuton. Ths leads to: p(x, Z θ) = 1 Z(θ) h(x, Z) exp { θ f (X, Z)} As a specal case, when p(x Z) are GLM s, f (X, Z) = η T (z)ξ (z). The expected complete log lelhood under q t+1 = p(z X, θ t ) s: < l c (θ t ; X, Z) > q t+1 = Z = = q(z X, θ t ) log p(x, Z θ t ) A(θ) θ t < f (X, Z) > q(z X,θt ) A(θ) θ t < η (z) > q(z X,θt ) ξ (x) A(θ) Next, we loo at the M-step. As we saw when we derved equaton 4, we now that the free energy breas nto the followng two terms: F (q, θ) = Z q(z X) log p(x, Z θ) p(z X) = Z q(z X) log p(x, Z θ) Z q(z X) log q(z X) = < l c (θ; X, Z) > q +H q The frst term s the expected complete log lelhood (energy) and the second term, whch does not depend on θ, s the entropy. Thus, agan, n the M-step maxmzng wth respect to θ for a fxed dstrbuton q s equvalent to maxmzng the frst term. θ t+1 = argmax θ < l c (θ; X, Z) > q t+1= argmax θ q(z X) log p(x, Z θ) Under optmal q t+1, ths s equvalent to solvng a standard MLE of fully observed model p(x, Z θ), wth the suffcent statstcs nvolvng Z replaced by ther expectatons w.r.t. p(z X, θ). Z 7 Example: Hdden Marov Model We have seen nference n the Hdden Marov Model n the prevous lectures. Now, we are well equpped to tacle the ssue of learnng n HMMs. Learnng n HMMs can be categorzed nto supervsed and unsupervsed learnng. Supervsed Learnng s the case when the rght answer s nown e.g. a genomc regon x = x 1... x 1,000,000 where we have good (expermental) annotatons of the CpG slands and the casno player who allows us to observe hm one evenng, as he changes dce and produces (say) 10,000 rolls. In ths case, the estmaton can be acheved by ML estmaton as all the varables are fully observed. In the unsupervsed learnng scenaro, we only observe some of the random varables e.g. the porcupne genome when we dont now how frequent are the CpG slands there, nether do we now ther composton and the case when we observe the 10,000 rolls of the casno player, but we dont see when he changes the dce.

7 9 : Learnng Partally Observed GM : EM Algorthm 7 Here, we ntroduce a well now EM algorthm to do learnng n HMMs, the Bomb-Welch algorthm. Recall the complete log-lelhood of HMMs: l c (θ; X, Y ) = logp (X, Y ) = log n The expected complete log lelhood can be wrtten as: T T (p(y n,1 ) p(y n,t y n,t 1 ) p(x n,t y n,t )) t=2 t=1 (< yn,1 > p(yn,1 x n) log π ) + n n T t=2(< y n,t 1y j n,t > p(yn,t 1,y n,t x n)) + n T (x n,t < yn,t > p(yn,t x n) log b, ) t=1 The EM updates can be obtaned as: E Step: M Step: γ n,t = < y n,t >= p(y n,t = 1 X n ) ξ,j n,t = < y n,t 1y j n,t >= p(y n,t 1 = 1, y j n,t = 1 X n ) π ML = n γ n,1 N a ML j = T n t=2 ξj n,t T 1 t=1 γ n,t n b ML = T n t=1 γ n,txn,t T 1 n t=1 γ n,t In the unsupervsed settng, we are gven x = x 1... x N for whch the true state path y = y 1... y N s unnown. We can use the EM algorthm here too. The algorthm s as follows: Start wth the best guess of a model M and parameters θ. Compute the statstcs (A j = n,t < y n,t 1y j n,t > and B j = n,t < y n,t >) from the tranng data. Update θ usng A j and B as computed before, treatng t as a supervsed learnng problem. Repeat untl Convergence. EM algorthm s a very powerful algorthm and s wdely used n practce. Next, we lay down the general technque for EM n Bayesan Networs: 8 Summary of EM algorthm EM algorthm s a way of maxmzng lelhood functon for latent varable models. It fnds MLE of parameters when the orgnal (hard) problem can be broen up nto two (easy) peces: the estmaton of some mssng or unobserved data from observed data and current parameters and usng ths complete data to fnd the maxmum lelhood parameter estmates. The algorthm alternates between fllng n the latent varables usng the best guess (posteror) and updatng the parameters based on ths guess. In the M-step, we optmze a lower bound on the lelhood and n the E-step we close the gap, mang the bound tght.

8 8 9 : Learnng Partally Observed GM : EM Algorthm Algorthm 1: EM for general BNs Whle not converged: % E-step For each node : ESS = 0 % Reset expected suffcent statstcs For each data sample n Do nference wth X n,h For each node : ESS + =< SS (X n,, X n,π ) > p(xn,h X n, H ) % M-step For each node : θ = MLE(ESS ) 9 EM Varants: Next, we dscuss the applcaton of EM to specfc models le Condtonal mxture model: Mxture of experts model whch models p(y X) usng dfferent experts, each responsble for dfferent regons of the nput space, the condtonal mxture model, herarchcal mxture of experts and mxture of overlappng experts models. We pont the readers to the lecture sldes for detals. Fnally, two varatons of the EM algorthm, namely the Sparse EM algorthm and Generalzed (Incomplete) EM were ntroduced. In sparse EM, we do not recompute exactly the posteror probablty on each data pont under all models, because t s almost zero. Instead, we eep an actve lst whch s updated once n a whle. In the Generalzed (Incomplete) EM, we see that t mght be hard to fnd the ML parameters n the M-step, even gven the completed data. However, we can stll mae progress by the completed data by dong an M-step that mproves the lelhood a bt (e.g. gradent step). Ths s n lne wth the IRLS step n the mxture of experts model. 10 EM: Report Card EM s one of the most popular algorthms to perform learnng n a partally observed settng. It does not requre any cumbersome tunng of learnng rate or step-sze parameters as n smlar gradent based methods. The parameter constrants are automatcally enforced. It s very fast for low dmensons and convergence s guaranteed as the lelhood mproves at each teraton. However, there are some ssues wth the EM approach. The EM algorthm can get stuc n local mnmas as the optmzaton can be non-convex n general. It can be slower than conjugate gradent (especally near convergence). It requres expensve nference steps and s a maxmum lelhood/map method. These ssues lead us to see better nference technques le Gbbs Samplng and Varatonal Inference that we wll see n the lectures to follow. Dsclamer: Some portons of the content has been drectly taen from the lecture sldes 1, last year s scrbe notes 2 and Kevn Murphy s ntroducton to GMs and BNs 3. The authors do not clam orgnalty of the scrbe. 1 epxng/class/10708/lecture/lecture9-em.pdf 2 epxng/class/ /lecture/scrbe10.pdf 3 murphy/bayes/bnntro.html

EM and Structure Learning

EM and Structure Learning EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder

More information

9 : Learning Partially Observed GM : EM Algorithm

9 : Learning Partially Observed GM : EM Algorithm 10-708: Probablstc Graphcal Models 10-708, Sprng 2014 9 : Learnng Partally Observed GM : EM Algorthm Lecturer: Erc P. Xng Scrbes: Rohan Ramanath, Rahul Goutam 1 Generalzed Iteratve Scalng In ths secton,

More information

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2) 1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons

More information

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Hidden Markov Models & The Multivariate Gaussian (10/26/04) CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.

More information

Conjugacy and the Exponential Family

Conjugacy and the Exponential Family CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of

More information

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010 Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton

More information

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF 10-708: Probablstc Graphcal Models 10-708, Sprng 2014 8 : Learnng n Fully Observed Markov Networks Lecturer: Erc P. Xng Scrbes: Meng Song, L Zhou 1 Why We Need to Learn Undrected Graphcal Models In the

More information

Expectation Maximization Mixture Models HMMs

Expectation Maximization Mixture Models HMMs -755 Machne Learnng for Sgnal Processng Mture Models HMMs Class 9. 2 Sep 200 Learnng Dstrbutons for Data Problem: Gven a collecton of eamples from some data, estmate ts dstrbuton Basc deas of Mamum Lelhood

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maxmum Lkelhood Estmaton INFO-2301: Quanttatve Reasonng 2 Mchael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 1 of 9 Why MLE?

More information

Semi-Supervised Learning

Semi-Supervised Learning Sem-Supervsed Learnng Consder the problem of Prepostonal Phrase Attachment. Buy car wth money ; buy car wth wheel There are several ways to generate features. Gven the lmted representaton, we can assume

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran

More information

Chapter Newton s Method

Chapter Newton s Method Chapter 9. Newton s Method After readng ths chapter, you should be able to:. Understand how Newton s method s dfferent from the Golden Secton Search method. Understand how Newton s method works 3. Solve

More information

Gaussian Mixture Models

Gaussian Mixture Models Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

The Basic Idea of EM

The Basic Idea of EM The Basc Idea of EM Janxn Wu LAMDA Group Natonal Key Lab for Novel Software Technology Nanjng Unversty, Chna wujx2001@gmal.com June 7, 2017 Contents 1 Introducton 1 2 GMM: A workng example 2 2.1 Gaussan

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline Outlne Bayesan Networks: Maxmum Lkelhood Estmaton and Tree Structure Learnng Huzhen Yu janey.yu@cs.helsnk.f Dept. Computer Scence, Unv. of Helsnk Probablstc Models, Sprng, 200 Notces: I corrected a number

More information

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before

More information

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation An Experment/Some Intuton I have three cons n my pocket, 6.864 (Fall 2006): Lecture 18 The EM Algorthm Con 0 has probablty λ of heads; Con 1 has probablty p 1 of heads; Con 2 has probablty p 2 of heads

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

18.1 Introduction and Recap

18.1 Introduction and Recap CS787: Advanced Algorthms Scrbe: Pryananda Shenoy and Shjn Kong Lecturer: Shuch Chawla Topc: Streamng Algorthmscontnued) Date: 0/26/2007 We contnue talng about streamng algorthms n ths lecture, ncludng

More information

Maximum Likelihood Estimation (MLE)

Maximum Likelihood Estimation (MLE) Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y

More information

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30 STATS 306B: Unsupervsed Learnng Sprng 2014 Lecture 10 Aprl 30 Lecturer: Lester Mackey Scrbe: Joey Arthur, Rakesh Achanta 10.1 Factor Analyss 10.1.1 Recap Recall the factor analyss (FA) model for lnear

More information

3.1 ML and Empirical Distribution

3.1 ML and Empirical Distribution 67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum

More information

Lecture 12: Classification

Lecture 12: Classification Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ CSE 455/555 Sprng 2013 Homework 7: Parametrc Technques Jason J. Corso Computer Scence and Engneerng SUY at Buffalo jcorso@buffalo.edu Solutons by Yngbo Zhou Ths assgnment does not need to be submtted and

More information

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law: CE304, Sprng 2004 Lecture 4 Introducton to Vapor/Lqud Equlbrum, part 2 Raoult s Law: The smplest model that allows us do VLE calculatons s obtaned when we assume that the vapor phase s an deal gas, and

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors Stat60: Bayesan Modelng and Inference Lecture Date: February, 00 Reference Prors Lecturer: Mchael I. Jordan Scrbe: Steven Troxler and Wayne Lee In ths lecture, we assume that θ R; n hgher-dmensons, reference

More information

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1 On an Extenson of Stochastc Approxmaton EM Algorthm for Incomplete Data Problems Vahd Tadayon Abstract: The Stochastc Approxmaton EM (SAEM algorthm, a varant stochastc approxmaton of EM, s a versatle tool

More information

1 Motivation and Introduction

1 Motivation and Introduction Instructor: Dr. Volkan Cevher EXPECTATION PROPAGATION September 30, 2008 Rce Unversty STAT 63 / ELEC 633: Graphcal Models Scrbes: Ahmad Beram Andrew Waters Matthew Nokleby Index terms: Approxmate nference,

More information

Hidden Markov Models

Hidden Markov Models Hdden Markov Models Namrata Vaswan, Iowa State Unversty Aprl 24, 204 Hdden Markov Model Defntons and Examples Defntons:. A hdden Markov model (HMM) refers to a set of hdden states X 0, X,..., X t,...,

More information

Markov Chain Monte Carlo Lecture 6

Markov Chain Monte Carlo Lecture 6 where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

Clustering with Gaussian Mixtures

Clustering with Gaussian Mixtures Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

Clustering gene expression data & the EM algorithm

Clustering gene expression data & the EM algorithm CG, Fall 2011-12 Clusterng gene expresson data & the EM algorthm CG 08 Ron Shamr 1 How Gene Expresson Data Looks Entres of the Raw Data matrx: Rato values Absolute values Row = gene s expresson pattern

More information

Expectation propagation

Expectation propagation Expectaton propagaton Lloyd Ellott May 17, 2011 Suppose p(x) s a pdf and we have a factorzaton p(x) = 1 Z n f (x). (1) =1 Expectaton propagaton s an nference algorthm desgned to approxmate the factors

More information

Clustering & Unsupervised Learning

Clustering & Unsupervised Learning Clusterng & Unsupervsed Learnng Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 2012 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Economics 101. Lecture 4 - Equilibrium and Efficiency

Economics 101. Lecture 4 - Equilibrium and Efficiency Economcs 0 Lecture 4 - Equlbrum and Effcency Intro As dscussed n the prevous lecture, we wll now move from an envronment where we looed at consumers mang decsons n solaton to analyzng economes full of

More information

Classification as a Regression Problem

Classification as a Regression Problem Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class

More information

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:

More information

A linear imaging system with white additive Gaussian noise on the observed data is modeled as follows:

A linear imaging system with white additive Gaussian noise on the observed data is modeled as follows: Supplementary Note Mathematcal bacground A lnear magng system wth whte addtve Gaussan nose on the observed data s modeled as follows: X = R ϕ V + G, () where X R are the expermental, two-dmensonal proecton

More information

Clustering & (Ken Kreutz-Delgado) UCSD

Clustering & (Ken Kreutz-Delgado) UCSD Clusterng & Unsupervsed Learnng Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y ), fnd an approxmatng

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

Probabilistic Classification: Bayes Classifiers. Lecture 6:

Probabilistic Classification: Bayes Classifiers. Lecture 6: Probablstc Classfcaton: Bayes Classfers Lecture : Classfcaton Models Sam Rowes January, Generatve model: p(x, y) = p(y)p(x y). p(y) are called class prors. p(x y) are called class condtonal feature dstrbutons.

More information

Lecture Nov

Lecture Nov Lecture 18 Nov 07 2008 Revew Clusterng Groupng smlar obects nto clusters Herarchcal clusterng Agglomeratve approach (HAC: teratvely merge smlar clusters Dfferent lnkage algorthms for computng dstances

More information

Composite Hypotheses testing

Composite Hypotheses testing Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter

More information

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics /7/7 CSE 73: Artfcal Intellgence Bayesan - Learnng Deter Fox Sldes adapted from Dan Weld, Jack Breese, Dan Klen, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer What s Beng Learned? Space

More information

Probability Density Function Estimation by different Methods

Probability Density Function Estimation by different Methods EEE 739Q SPRIG 00 COURSE ASSIGMET REPORT Probablty Densty Functon Estmaton by dfferent Methods Vas Chandraant Rayar Abstract The am of the assgnment was to estmate the probablty densty functon (PDF of

More information

Grover s Algorithm + Quantum Zeno Effect + Vaidman

Grover s Algorithm + Quantum Zeno Effect + Vaidman Grover s Algorthm + Quantum Zeno Effect + Vadman CS 294-2 Bomb 10/12/04 Fall 2004 Lecture 11 Grover s algorthm Recall that Grover s algorthm for searchng over a space of sze wors as follows: consder the

More information

XII.3 The EM (Expectation-Maximization) Algorithm

XII.3 The EM (Expectation-Maximization) Algorithm XII.3 The EM (Expectaton-Maxzaton) Algorth Toshnor Munaata 3/7/06 The EM algorth s a technque to deal wth varous types of ncoplete data or hdden varables. It can be appled to a wde range of learnng probles

More information

Mean Field / Variational Approximations

Mean Field / Variational Approximations Mean Feld / Varatonal Appromatons resented by Jose Nuñez 0/24/05 Outlne Introducton Mean Feld Appromaton Structured Mean Feld Weghted Mean Feld Varatonal Methods Introducton roblem: We have dstrbuton but

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

Neural networks. Nuno Vasconcelos ECE Department, UCSD

Neural networks. Nuno Vasconcelos ECE Department, UCSD Neural networs Nuno Vasconcelos ECE Department, UCSD Classfcaton a classfcaton problem has two types of varables e.g. X - vector of observatons (features) n the world Y - state (class) of the world x X

More information

Hidden Markov Models

Hidden Markov Models CM229S: Machne Learnng for Bonformatcs Lecture 12-05/05/2016 Hdden Markov Models Lecturer: Srram Sankararaman Scrbe: Akshay Dattatray Shnde Edted by: TBD 1 Introducton For a drected graph G we can wrte

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

Statistical learning

Statistical learning Statstcal learnng Model the data generaton process Learn the model parameters Crteron to optmze: Lkelhood of the dataset (maxmzaton) Maxmum Lkelhood (ML) Estmaton: Dataset X Statstcal model p(x;θ) (θ parameters)

More information

Limited Dependent Variables

Limited Dependent Variables Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages

More information

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there

More information

SDMML HT MSc Problem Sheet 4

SDMML HT MSc Problem Sheet 4 SDMML HT 06 - MSc Problem Sheet 4. The recever operatng characterstc ROC curve plots the senstvty aganst the specfcty of a bnary classfer as the threshold for dscrmnaton s vared. Let the data space be

More information

The EM Algorithm (Dempster, Laird, Rubin 1977) The missing data or incomplete data setting: ODL(φ;Y ) = [Y;φ] = [Y X,φ][X φ] = X

The EM Algorithm (Dempster, Laird, Rubin 1977) The missing data or incomplete data setting: ODL(φ;Y ) = [Y;φ] = [Y X,φ][X φ] = X The EM Algorthm (Dempster, Lard, Rubn 1977 The mssng data or ncomplete data settng: An Observed Data Lkelhood (ODL that s a mxture or ntegral of Complete Data Lkelhoods (CDL. (1a ODL(;Y = [Y;] = [Y,][

More information

Retrieval Models: Language models

Retrieval Models: Language models CS-590I Informaton Retreval Retreval Models: Language models Luo S Department of Computer Scence Purdue Unversty Introducton to language model Ungram language model Document language model estmaton Maxmum

More information

Finding Dense Subgraphs in G(n, 1/2)

Finding Dense Subgraphs in G(n, 1/2) Fndng Dense Subgraphs n Gn, 1/ Atsh Das Sarma 1, Amt Deshpande, and Rav Kannan 1 Georga Insttute of Technology,atsh@cc.gatech.edu Mcrosoft Research-Bangalore,amtdesh,annan@mcrosoft.com Abstract. Fndng

More information

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z ) C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z

More information

Lecture 20: November 7

Lecture 20: November 7 0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

Global Sensitivity. Tuesday 20 th February, 2018

Global Sensitivity. Tuesday 20 th February, 2018 Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

The Geometry of Logit and Probit

The Geometry of Logit and Probit The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.

More information

Primer on High-Order Moment Estimators

Primer on High-Order Moment Estimators Prmer on Hgh-Order Moment Estmators Ton M. Whted July 2007 The Errors-n-Varables Model We wll start wth the classcal EIV for one msmeasured regressor. The general case s n Erckson and Whted Econometrc

More information

CS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016

CS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016 CS 29-128: Algorthms and Uncertanty Lecture 17 Date: October 26, 2016 Instructor: Nkhl Bansal Scrbe: Mchael Denns 1 Introducton In ths lecture we wll be lookng nto the secretary problem, and an nterestng

More information

Physics 5153 Classical Mechanics. D Alembert s Principle and The Lagrangian-1

Physics 5153 Classical Mechanics. D Alembert s Principle and The Lagrangian-1 P. Guterrez Physcs 5153 Classcal Mechancs D Alembert s Prncple and The Lagrangan 1 Introducton The prncple of vrtual work provdes a method of solvng problems of statc equlbrum wthout havng to consder the

More information

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs

More information

4.3 Poisson Regression

4.3 Poisson Regression of teratvely reweghted least squares regressons (the IRLS algorthm). We do wthout gvng further detals, but nstead focus on the practcal applcaton. > glm(survval~log(weght)+age, famly="bnomal", data=baby)

More information

Bayesian predictive Configural Frequency Analysis

Bayesian predictive Configural Frequency Analysis Psychologcal Test and Assessment Modelng, Volume 54, 2012 (3), 285-292 Bayesan predctve Confgural Frequency Analyss Eduardo Gutérrez-Peña 1 Abstract Confgural Frequency Analyss s a method for cell-wse

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017 U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

VQ widely used in coding speech, image, and video

VQ widely used in coding speech, image, and video at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng

More information

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA 4 Analyss of Varance (ANOVA) 5 ANOVA 51 Introducton ANOVA ANOVA s a way to estmate and test the means of multple populatons We wll start wth one-way ANOVA If the populatons ncluded n the study are selected

More information

Course 395: Machine Learning - Lectures

Course 395: Machine Learning - Lectures Course 395: Machne Learnng - Lectures Lecture 1-2: Concept Learnng (M. Pantc Lecture 3-4: Decson Trees & CC Intro (M. Pantc Lecture 5-6: Artfcal Neural Networks (S.Zaferou Lecture 7-8: Instance ased Learnng

More information

Learning with Maximum Likelihood

Learning with Maximum Likelihood Learnng wth Mamum Lelhood Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm,

More information

CS : Algorithms and Uncertainty Lecture 14 Date: October 17, 2016

CS : Algorithms and Uncertainty Lecture 14 Date: October 17, 2016 CS 294-128: Algorthms and Uncertanty Lecture 14 Date: October 17, 2016 Instructor: Nkhl Bansal Scrbe: Antares Chen 1 Introducton In ths lecture, we revew results regardng follow the regularzed leader (FTRL.

More information

Online Classification: Perceptron and Winnow

Online Classification: Perceptron and Winnow E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng

More information

Learning Theory: Lecture Notes

Learning Theory: Lecture Notes Learnng Theory: Lecture Notes Lecturer: Kamalka Chaudhur Scrbe: Qush Wang October 27, 2012 1 The Agnostc PAC Model Recall that one of the constrants of the PAC model s that the data dstrbuton has to be

More information

Expected Value and Variance

Expected Value and Variance MATH 38 Expected Value and Varance Dr. Neal, WKU We now shall dscuss how to fnd the average and standard devaton of a random varable X. Expected Value Defnton. The expected value (or average value, or

More information

CS 229, Public Course Problem Set #3 Solutions: Learning Theory and Unsupervised Learning

CS 229, Public Course Problem Set #3 Solutions: Learning Theory and Unsupervised Learning CS9 Problem Set #3 Solutons CS 9, Publc Course Problem Set #3 Solutons: Learnng Theory and Unsupervsed Learnng. Unform convergence and Model Selecton In ths problem, we wll prove a bound on the error of

More information

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata Multlayer Perceptrons and Informatcs CG: Lecture 6 Mrella Lapata School of Informatcs Unversty of Ednburgh mlap@nf.ed.ac.uk Readng: Kevn Gurney s Introducton to Neural Networks, Chapters 5 6.5 January,

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information