9 : Learning Partially Observed GM : EM Algorithm

Size: px
Start display at page:

Download "9 : Learning Partially Observed GM : EM Algorithm"

Transcription

1 10-708: Probablstc Graphcal Models , Sprng : Learnng Partally Observed GM : EM Algorthm Lecturer: Erc P. Xng Scrbes: Rohan Ramanath, Rahul Goutam 1 Generalzed Iteratve Scalng In ths secton, we summarze from the prevous lecture the use of Generalzed Iteratve Scalng (GIS) to fnd the Mamum Lelhood Estmaton (MLE) of a feature based Undrected Graphcal Model (UGM). Each feature has a weght θ whch represents the numercal strength of the feature and whether t ncreases or decreases the probablty of the clque. To represent these features as a part of the prabablstc model, we can multply n the clque potentals to get: p() = ep{ θ f ( c )} Z(θ) At the mamum lelhood settng of the parameters, for each clque, the model margnals must be equal to the observed margnals,.e. p m() MLE () = N = p(), where m() s the number of tmes the confguraton s found out of a possble N. The scaled lelhood functon can then be represented as: l(θ; D) = l(θ; D) N = = 1 log p( n θ) N n p() log p( θ) = p() θ f () log Z(θ) Solvng the log Z(θ) term s very complcated. The tangent to the log functon forms an upper bound on t (log Z(θ) µz(θ) log µ 1, µ). Ths fact s used to get a lower bound on the value of l(θ; D) by usng µ = Z 1 (θ (t) ). Substtutng θ (t) = θ θ (t), we get: l(θ; D) p() = = = p() θ f () Z(θ) Z(θ (t) ) log Z(θ(t) ) + 1 θ f () ep{ θ f ()} log Z(θ (t) ) + 1 Z(θ (t) ) θ p()f () θ p()f () ep{ θ(t) p( θ (t) )ep{ f ()}ep{ θ(t) f ()} log Z(θ (t) ) + 1 Z(θ (t) ) θ (t) f ()} log Z(θ (t) ) + 1 1

2 2 9 : Learnng Partally Observed GM : EM Algorthm We can further rela the constrans by assumng f () = 1 and f () 0. Snce the eponental functon s conve, ep( π ) π ep( ) l(θ; D) θ p()f () Λ = p()f () ep( θ (t) ) θ ep( θ (t) ) = p()f () p( θ(t) )f () = p( θ (t) ) f () ep{ θ (t) } log Z(θ (t) ) + 1 = Λ(θ) p( θ (t) )f () = 0 p()f () p(t) ()f () Z(θ(t) ) where p (t) () s the unnormalzed verson of p( θ (t) ). Substtutng the result n the update equaton we get: θ (t+1) = θ (t) + θ (t) p (t+1) () = p (t) () e θ(t) f () = p (t) () ( p()f () p(t) ()f () ( θ (t+1) = θ (t) + log p()f ) () p(t) ()f () ) f() To summarze we can say that GIS can be used for teratve scalng on general UGM wth feature based pottentals. IPF s a specal case of GIS n whch the clque potental s bult on features defned as an ndcator functon of clque confguratons. 2 Mture Models Let us consder a dataset X. We wsh to model the data by specfyng a jont dstrbuton p(x, Z ) = p(x Z )p(z ). Here, Z Multnomal(π) (where π j 0, j π j = 1,and the parameter π j gves p(z = j),) and X Z j = j N (µ j, Σ j ). We let denote the number of values that the Z s can tae on. Thus, our model posts that each X was generated by randomly choosng Z from {1,..., }, and then X was drawn from one of Gaussans dependng on Z. Ths s called the mture of Gaussans model. Also, note that the z () s are latent random varables, meanng that theyre hdden/unobserved. Ths maes the problem tractable as each condtonal dstrbuton P (X Z ) can now be assumed to be unmodal. In the specfc case of a mture of gaussans, margnal densty s factorzed as: P (X µ, Σ) = π N (X µ, Σ ) where π s called the mture proporton and N (X µ, Σ ) s the gaussan dstrbuton for the mture component. The tas s to learn the values of π, µ and Σ. In the fully observed case, t s farly straghtforward as we would have smply decomposed the log lelhood functon and then found the mamum lelhood estmates for all the parameters. The log lelhood can be epressed as:

3 9 : Learnng Partally Observed GM : EM Algorthm 3 Fgure 1: A mture of 2 Gaussans l(θ; D) = log = log P (, z ) P (z π)p ( z, µ, Σ) = = log π z + log N ( µ, Σ ) z z log(π ) 1 z ( µ ) T Σ 1 2 ( µ ) + C Snce t s fully observed, the log lelhood completely decomposes. However, f we do not observe the values of z as n the mture model settng, all the parameters become coupled together as shown below: l c (θ; D) = log z P (, z θ) = log z P (z θ z )P ( z, θ ) It s easy to observe that the summaton s nsde the log and the log lelhood does not decompose ncely anymore. Hence, t s hard to fnd a closed form soluton to the mamum lelhood estmates. Ths provdes motvaton for the Epectaton Mamzaton (EM) algorthm. 3 K-Means Gven a set of observatons ( 1, 2,..., n ), where each observaton s a d-dmensonal real vector, -means clusterng ams to partton the n observatons nto sets ( n) gven by z = {z 1, z 2,..., z n } so as to mnmze the wthn-cluster sum of squares (WCSS). We start wth ntal random guesses for the class centrods and terate between the followng two steps untl convergence. Epectaton Step: Use some dstance metrc (Eucldean dstance, cosne smlarty, e.t.c) and the

4 4 9 : Learnng Partally Observed GM : EM Algorthm current guess of the centrods to assgn every data pont to the nearest centrod. z (t) = argmn ( µ (t) )T Σ 1(t) ( µ (t) ) Mamzaton Step: Use the new class assgnment to recompute centrod values. µ (t+1) = δ(z(t), ) δ(z(t), ) To understand the EM vewpont, we can thn of each of the clusters beng assocated wth some dstrbuton,.e. P ( z = ) N (X µ, Σ ) and we would le to learn the parameters (µ and Σ ) for each dstrbuton. 4 EM Algorthm The EM algorthm s an effcent teratve procedure to compute the Mamum Lelhood (ML) estmate n the presence of latent varables. In ML estmaton, we wsh to estmate the model parameter(s) for whch the observed data are the most lely. As seen n Secton 2 the mamum lelhood estmator cannot be used for nference n the partally observed settng. Fndng a mamum lelhood soluton requres tang dervatves of the lelhood functon wth respect to both the parameters and the latent varables and smultaneously solvng the resultng equatons. In statstcal models wth latent varables, ths usually leads to ntractable solutons. The EM algorthm wors around ths by consderng the epected complete log lelhood. Each teraton of the EM algorthm conssts of two processes: The E-step, and the M-step. In the epectaton, or E-step, the latent varables are estmated gven the observed data and current estmate of the model parameters. Ths s acheved usng the condtonal epectaton, eplanng the choce of termnology.in the M-step, the lelhood functon s mamzed under the assumpton that the latent varables are nown. The estmate of the latent varables from the E-step are used n leu of the actual latent varables. The log lelhood from Secton 2 can be wrtten n the form of an epected log lelhood as: < z > P (Z X;θ) log(π ) 1 2 < z > P (Z X;θ (( µ ) T Σ 1 ( µ ) + log Σ ) + C where < z > P (Z X;θ) s the epected value of z wth respect to P (Z X; θ). The EM algorthm mamzes < l c (θ; D) > P (Z X;θ) by teratng between two steps (called E and M steps). In the E step, we compute the suffcent statstcs of the hdden varable (.e., Z) usng the current estmate of the parameters (.e. µ and Σ ). The M step, mamzes the value of the parameters usng the epected value of the hdden varables computed n the precedng E step. Specfcally, n the GMM, we compute < z > P (Z X;θ) n the E step and mamze the epected complete log lelhood wth respect to the model parameters µ, Σ and π n the M step to obtan: π (t+1) = < n > P (Z X;θ) N µ (t+1) = Σ (t+1) = τ (t) τ (t) τ (t) ( µ t+1 where τ (t) =< z > P (Z X;θ) and < n > P (Z X;θ) = τ (t) )( µ t+1 τ (t) ) T

5 9 : Learnng Partally Observed GM : EM Algorthm 5 5 Comparson between K-means and EM EM algorthm for the mture of gaussans s a soft verson of K means. For K-means, n the E-step, we assgn ponts to clusters and n the M-step, we recompute the cluster centers assumng each pont belongs to a sngle cluster. In the EM verson, the E-step we assgn ponts to clusters n a probablstc manner and n the M-step, we recompute the cluster centrod assumng that the ponts are assgned to clusters n a probablstc manner where the weght of each pont s gven by τ (t) 6 Theory underlyng EM Algorthm The prevous sectons tell us how the EM algorthm can be used to compute the Mamum Lelhood (ML) estmate n the presence of mssng or hdden data. Let X represent the set of observable varables and Z represent the set of all latent varables. The probablty model s p(x, Z θ). If Z were observed, we defne the complete log lelhood as l c (θ; X, Z) = log p(x, Z θ) The ML estmaton would then be mamzng the complete log lelhood. If the probablty p(x, Z θ) factors such that separate components of θ occur n separate factors, then the log separates them nto dfferent terms whch can be mamzed ndependently (decouplng). However, snce Z s not observed, t has to be margnalzed out and the ncomplete log lelhood functon taes the followng form : l c (θ; X) = log p(x θ) = log Z p(x, Z θ) The ncomplete log lelhood cannot be decoupled because the summaton over p(x, Z θ) les nsde the logarthm. In order to handle ths, we average over z to remove the randomness usng an averagng dstrbuton q(z X). The epected complete log lelhood can then be defned as : < l c (θ;, Z) > q = Z q(z, θ) log p(, Z θ) The epected complete log lelhood s a determnstc functon of θ. It s also lnear n the complete log lelhood functon and, hence, nherts ts factorzablty. The epected complete log lelhood also provdes a lower bound on the complete log lelhood as shown below: l(θ; ) = log p( θ) = log Z p(, Z θ) = log Z q(z ) p(, Z θ) q(z )

6 6 9 : Learnng Partally Observed GM : EM Algorthm Z q(z ) log p(, Z θ) (Jensen s nequalty) q(z ) l(θ; ) < l c (θ;, Z) > q +H q where H q s the entropy of the dstrbuton q and does not depend on θ. H q s a constant for a partcular dstrbuton q. Hence, n order to mamze the complete log lelhood wth respect to θ, t s suffcent to mamze the eptected complete log lelhood. The EM algorthm can be seen as a coordnate ascent algorthm. For a fed data X, let a functonal F, called the free energy, be defned as : F (q, θ) = Z q(z ) log p(, Z θ) q(z ) l(θ; ) The EM algorthm s a coordnate ascent algorthm on F. The E-step and the M-step can then be defned as E-step : q t+1 = argma q F (q, θ t ) M-step : θ t+1 = argma θ F (q t+1, θ) At the (t + 1) th step, we mamze the free energy F (q, θ) wth respect to q. For ths optmal choce of q t+1, we mamze F (q, θ) agan wth respect to θ to get the optmal updated value θ t+1. Let us loo at the E-step. If we set q t+1 (z ) to the posteror dstrbuton of the latent varables gven the data and parameters, p(z, θ t ), we mamze F (q, θ t ). The proof of ths clam s gven below : F (p(z, θ t ), θ t ) = Z = Z p(z, θ t ) log p(, Z θt ) p(z, θ t ) q(z ) log p( θ t ) = log p( θ t ) = l(θ t ; ) Snce l(θ t ; ) s an upper bound on F (q, θ t ), the proof shows that F (q, θ t ) s mamzed by settng q(z ) to the posteror probablty dstrbuton p(z, θ t ). The above clam can also be proved usng varatonal calculus or the fact that l(θ; ) F (q, θ) = KL(q p(z, θ)) Wthout loss of generalty, we can assume p(, Z θ) to be a generalzed eponental famly dstrbuton. Then, p(, Z θ) = 1 Z(θ) h(, Z)ep{ θ f (, Z)} A specal case s when p( Z) are GLM. Then, f (, Z) = η T (z)ξ ()

7 9 : Learnng Partally Observed GM : EM Algorthm 7 The epected complete log lelhood under q t+1 = p(z, θ t ) s gven below : < l c (θ t ;, Z) > q t+1 = = q(z, θ t ) log p(, Z θ t ) A(θ) Z θ t < f (, Z) > q(z,θ t ) A(θ) = p GLIM θ t < η (Z) > q(z,θ t ) ξ () A(θ) Now, let us analyze the M-step of the EM algorthm. The M-step can be vewed as the mamzaton of the epected complete log lelhood. Ths s shown below : F (q, θ) = Z = Z q(z ) log p(, Z θ) q(z ) q(z ) log p(, Z θ) Z q(z ) log q(z ) = < l c (θ;, Z) > q +H q The free energy, hence, breas nto two terms. The frst term s the epected complete log lelhood and the second term, whch s ndependent of θ, s the entropy. Thus, mamzng the free energy s equvalent to mamzng the epected complete log lelhood. θ t+1 = argma θ < l c (θ;, Z) > q t+1= argma θ q(z ) log p(, Z θ) Under optmal q t+1, ths s equvalent to solvng a standard MLE of fully observed model p(, Z θ), wth the suffcent statstcs nvolvng Z replaced by ther epectatons w.r.t. p(z, θ). Z 7 Eample: Hdden Marov Model Let us loo at the learnng problem of Hdden Marov Models (HMM) n the framewor of the EM algorthm. The learnng problem n HMMs could be Supervsed learnng In supervsed learnng, we have annotated data wth the correct answer nown. Eamples nclude a genomc regon = where we have good annotatons of the CpG slands or f a casno player allows us to observe hm as he changes dce and produces 10,000 rolls. Snce all varables are fully observed, we can learn the parameters θ of the model by applyng MLE. Unsupervsed learnng In unsupervsed learnng, we do not have annotated data. We only observe some of the varables and the remanng varables (latent varables) reman unnown. Eamples nclude the porcupne genome where we don t now how frequent the CpG slands are or 10,000 rolls of dce of a casno player wthout nowng when he changes dce.

8 8 9 : Learnng Partally Observed GM : EM Algorthm Fgure 2: A hdden marov model 7.1 Baum-Welch Algorthm The Baum-Welch algorthm s used learn the parameters θ of a HMM. The complete log lelhood of a HMM can be wrtten as l c (θ;, Y ) = log p(, Y ) = log n T T p(y n,1 ) p(y n,t Y n,t 1 ) p( n,t Y n,t ) t=2 t=1 The epected complete log lelhood can be wrtten as < l c (θ;, Y ) > = (< Yn,1 > p(yn,1 n) log π ) + T (< Yn,t 1Y j n,t > p(yn,t 1Y n,t n) log a,j ) n n t=2 + T ( n,t < Yn,t > p(yn,t n) log b, ) n t=1 where A = {a,j } = P (Y t = j Y t 1 = ) s the transton matr and B = {b,j } = P (Y t = t = j) s the emsson matr. The E-step of the EM algorthm s then γ n,t = < Y n,t >= p(y n,t = 1 n ) ξ,j n,t = < Y n,t 1Y j n,t >= p(y n,t 1 = 1, Y j n,t = 1 n ) The M-step of the algorthm s π ML = a ML j = b ML = n γ n,1 N T n t=2 ξ,j n,t T 1 n t=1 γ n,t T n t=1 γ n,t n,t T 1 n t=1 γ n,t In the unsupervsed learnng case, the Baum-Welch algorthm proceeds as follows : 1. Start wth the best guess of parameters θ for the model

9 9 : Learnng Partally Observed GM : EM Algorthm 9 2. Estmate A j, B n the tranng data. A j = n,t < Y n,t 1Y j n,t > B = n,t < Y n,t > n,t 3. Update θ accordng to A j, B. Now, problem becomes supervsed learnng. 4. Repeat steps 2 and 3 untl convergence. It can be proven that we get a more (or equally) lely parameter set θ n each teraton. 8 EM for general BNs Algorthm 1 shows the algorthm for EM n a general Bayesan Networ. whle not converged do for each node do ESS = 0 end for each data sample n do do nference wth n,h for each node do ESS + =< SS ( n,, n,π ) > p(n,h n, H ) end end for each node do θ = MLE(ESS ) end end Algorthm 1: EM algorthm for general BN 9 Summary of EM algorthm In summary, the EM algorthm s a way of mamzng the lelhood of the latent varable models. computes the ML estmate of the parameters by breang the orgnal problem nto two parts: It 1. The estmaton of unobserved data from observed data and current parameters. 2. Usng ths model wth all models observed to fnd the ML estmates of parameters. The algorthm alternates between fllng n the latent varables usng the best guess and updatng the parameters based on ths guess. 10 EM for Condtonal Mture Model The model for the Condtonal mture model s defned as

10 10 9 : Learnng Partally Observed GM : EM Algorthm P (Y ) = p(z = 1, ξ)p(y z = 1,, θ, σ) The objectve functon for EM s defned as < l c (θ;, y, z) >= n < log p(z n n, ξ) > p(z,y ) + n < log(p(y n n, Z n, θ, σ) > p(z,y ) The E-step n the EM algorthm s τ (t) n = P (Zn = 1 n, Y n, θ) = P (Z n = 1 n )p (Y n n, θ, σ 2 ) j p(zj n = 1 n )p j (y n n, θ j, σj 2) The M-step n the EM algorthm uses the normal equaton for lnear regresson θ = ( T ) 1 T Y, but wth the data re-weghted by τ or usng the weghted IRLS algorthm to update ξ, θ, σ based on data ponts ( n, y n ) wth weghts τn (t). 11 EM Varants There are some varants of the EM algorthm. Sparse EM: The sparse EM algorthm does not recompute the posteror probablty on each data pont under all models, whch are very close to 0. Instead, t eeps an actve lst whch t updates every once n a whle. Generalzed (Incomplete) EM : Sometmes, t s hard to compute the Mamum lelhood estmate of the parameters n the M-step, even wth the complete data. However, we can stll mae progress by dong a M-step that ncreases the lelhood a bt (le a gradent step). 12 EM: Report Card EM s one of the most popular methods to get the Mamum Lelhood Estmate or the Mamum a Posteror Estmate of parameters n a model where there are some latent unobserved varables. EM does not requre any learnng rate parameter and s very fast for low-dmensons. It also ensures convergence as each teraton s guaranteed to mprove lelhood. The cons of EM algorthm are that t can gve us a local optma nstead of global optma. EM can be slower than conjugate gradent, especally near convergence. It requres an epensve nference step and t s a mamum lelhood/map method. Dsclamer: Some portons of the content has been drectly taen from the lecture sldes 1 and last years scrbe notes 2. The authors do not clam orgnalty of the scrbe. 1 epng/class/10708/lectures/lecture9-em.pdf 2 epng/class/ /lecture/scrbe9.pdf

9 : Learning Partially Observed GM : EM Algorithm

9 : Learning Partially Observed GM : EM Algorithm 10-708: Probablstc Graphcal Models 10-708, Sprng 2012 9 : Learnng Partally Observed GM : EM Algorthm Lecturer: Erc P. Xng Scrbes: Mrnmaya Sachan, Phan Gadde, Vswanathan Srpradha 1 Introducton So far n

More information

EM and Structure Learning

EM and Structure Learning EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder

More information

Expectation Maximization Mixture Models HMMs

Expectation Maximization Mixture Models HMMs -755 Machne Learnng for Sgnal Processng Mture Models HMMs Class 9. 2 Sep 200 Learnng Dstrbutons for Data Problem: Gven a collecton of eamples from some data, estmate ts dstrbuton Basc deas of Mamum Lelhood

More information

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF 10-708: Probablstc Graphcal Models 10-708, Sprng 2014 8 : Learnng n Fully Observed Markov Networks Lecturer: Erc P. Xng Scrbes: Meng Song, L Zhou 1 Why We Need to Learn Undrected Graphcal Models In the

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2) 1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Mean Field / Variational Approximations

Mean Field / Variational Approximations Mean Feld / Varatonal Appromatons resented by Jose Nuñez 0/24/05 Outlne Introducton Mean Feld Appromaton Structured Mean Feld Weghted Mean Feld Varatonal Methods Introducton roblem: We have dstrbuton but

More information

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Hidden Markov Models & The Multivariate Gaussian (10/26/04) CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models

More information

Mixture of Gaussians Expectation Maximization (EM) Part 2

Mixture of Gaussians Expectation Maximization (EM) Part 2 Mture of Gaussans Eectaton Mamaton EM Part 2 Most of the sldes are due to Chrstoher Bsho BCS Summer School Eeter 2003. The rest of the sldes are based on lecture notes by A. Ng Lmtatons of K-means Hard

More information

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression 11 MACHINE APPLIED MACHINE LEARNING LEARNING MACHINE LEARNING Gaussan Mture Regresson 22 MACHINE APPLIED MACHINE LEARNING LEARNING Bref summary of last week s lecture 33 MACHINE APPLIED MACHINE LEARNING

More information

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of

More information

Conjugacy and the Exponential Family

Conjugacy and the Exponential Family CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the

More information

Course 395: Machine Learning - Lectures

Course 395: Machine Learning - Lectures Course 395: Machne Learnng - Lectures Lecture 1-2: Concept Learnng (M. Pantc Lecture 3-4: Decson Trees & CC Intro (M. Pantc Lecture 5-6: Artfcal Neural Networks (S.Zaferou Lecture 7-8: Instance ased Learnng

More information

Lecture Nov

Lecture Nov Lecture 18 Nov 07 2008 Revew Clusterng Groupng smlar obects nto clusters Herarchcal clusterng Agglomeratve approach (HAC: teratvely merge smlar clusters Dfferent lnkage algorthms for computng dstances

More information

1 GSW Iterative Techniques for y = Ax

1 GSW Iterative Techniques for y = Ax 1 for y = A I m gong to cheat here. here are a lot of teratve technques that can be used to solve the general case of a set of smultaneous equatons (wrtten n the matr form as y = A), but ths chapter sn

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation An Experment/Some Intuton I have three cons n my pocket, 6.864 (Fall 2006): Lecture 18 The EM Algorthm Con 0 has probablty λ of heads; Con 1 has probablty p 1 of heads; Con 2 has probablty p 2 of heads

More information

1 Motivation and Introduction

1 Motivation and Introduction Instructor: Dr. Volkan Cevher EXPECTATION PROPAGATION September 30, 2008 Rce Unversty STAT 63 / ELEC 633: Graphcal Models Scrbes: Ahmad Beram Andrew Waters Matthew Nokleby Index terms: Approxmate nference,

More information

Semi-Supervised Learning

Semi-Supervised Learning Sem-Supervsed Learnng Consder the problem of Prepostonal Phrase Attachment. Buy car wth money ; buy car wth wheel There are several ways to generate features. Gven the lmted representaton, we can assume

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors Stat60: Bayesan Modelng and Inference Lecture Date: February, 00 Reference Prors Lecturer: Mchael I. Jordan Scrbe: Steven Troxler and Wayne Lee In ths lecture, we assume that θ R; n hgher-dmensons, reference

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010 Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Learning with Maximum Likelihood

Learning with Maximum Likelihood Learnng wth Mamum Lelhood Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm,

More information

Mixture o f of Gaussian Gaussian clustering Nov

Mixture o f of Gaussian Gaussian clustering Nov Mture of Gaussan clusterng Nov 11 2009 Soft vs hard lusterng Kmeans performs Hard clusterng: Data pont s determnstcally assgned to one and only one cluster But n realty clusters may overlap Soft-clusterng:

More information

Maximum Likelihood Estimation (MLE)

Maximum Likelihood Estimation (MLE) Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y

More information

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Mamum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models for

More information

Lecture 12: Classification

Lecture 12: Classification Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna

More information

Clustering gene expression data & the EM algorithm

Clustering gene expression data & the EM algorithm CG, Fall 2011-12 Clusterng gene expresson data & the EM algorthm CG 08 Ron Shamr 1 How Gene Expresson Data Looks Entres of the Raw Data matrx: Rato values Absolute values Row = gene s expresson pattern

More information

PHYS 705: Classical Mechanics. Calculus of Variations II

PHYS 705: Classical Mechanics. Calculus of Variations II 1 PHYS 705: Classcal Mechancs Calculus of Varatons II 2 Calculus of Varatons: Generalzaton (no constrant yet) Suppose now that F depends on several dependent varables : We need to fnd such that has a statonary

More information

CS 229, Public Course Problem Set #3 Solutions: Learning Theory and Unsupervised Learning

CS 229, Public Course Problem Set #3 Solutions: Learning Theory and Unsupervised Learning CS9 Problem Set #3 Solutons CS 9, Publc Course Problem Set #3 Solutons: Learnng Theory and Unsupervsed Learnng. Unform convergence and Model Selecton In ths problem, we wll prove a bound on the error of

More information

Lecture Space-Bounded Derandomization

Lecture Space-Bounded Derandomization Notes on Complexty Theory Last updated: October, 2008 Jonathan Katz Lecture Space-Bounded Derandomzaton 1 Space-Bounded Derandomzaton We now dscuss derandomzaton of space-bounded algorthms. Here non-trval

More information

Clustering with Gaussian Mixtures

Clustering with Gaussian Mixtures Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran

More information

Speech and Language Processing

Speech and Language Processing Speech and Language rocessng Lecture 3 ayesan network and ayesan nference Informaton and ommuncatons Engneerng ourse Takahro Shnozak 08//5 Lecture lan (Shnozak s part) I gves the frst 6 lectures about

More information

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

3.1 ML and Empirical Distribution

3.1 ML and Empirical Distribution 67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum

More information

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:

More information

Machine Learning for Signal Processing Linear Gaussian Models

Machine Learning for Signal Processing Linear Gaussian Models Machne Learnng for Sgnal rocessng Lnear Gaussan Models lass 2. 2 Nov 203 Instructor: Bhsha Raj 2 Nov 203 755/8797 HW3 s up. Admnstrva rojects please send us an update 2 Nov 203 755/8797 2 Recap: MA stmators

More information

Gaussian Mixture Models

Gaussian Mixture Models Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous

More information

Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD

Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD Matrx Approxmaton va Samplng, Subspace Embeddng Lecturer: Anup Rao Scrbe: Rashth Sharma, Peng Zhang 0/01/016 1 Solvng Lnear Systems Usng SVD Two applcatons of SVD have been covered so far. Today we loo

More information

Expectation propagation

Expectation propagation Expectaton propagaton Lloyd Ellott May 17, 2011 Suppose p(x) s a pdf and we have a factorzaton p(x) = 1 Z n f (x). (1) =1 Expectaton propagaton s an nference algorthm desgned to approxmate the factors

More information

Markov Chain Monte Carlo Lecture 6

Markov Chain Monte Carlo Lecture 6 where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways

More information

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ CSE 455/555 Sprng 2013 Homework 7: Parametrc Technques Jason J. Corso Computer Scence and Engneerng SUY at Buffalo jcorso@buffalo.edu Solutons by Yngbo Zhou Ths assgnment does not need to be submtted and

More information

The Basic Idea of EM

The Basic Idea of EM The Basc Idea of EM Janxn Wu LAMDA Group Natonal Key Lab for Novel Software Technology Nanjng Unversty, Chna wujx2001@gmal.com June 7, 2017 Contents 1 Introducton 1 2 GMM: A workng example 2 2.1 Gaussan

More information

18.1 Introduction and Recap

18.1 Introduction and Recap CS787: Advanced Algorthms Scrbe: Pryananda Shenoy and Shjn Kong Lecturer: Shuch Chawla Topc: Streamng Algorthmscontnued) Date: 0/26/2007 We contnue talng about streamng algorthms n ths lecture, ncludng

More information

Grover s Algorithm + Quantum Zeno Effect + Vaidman

Grover s Algorithm + Quantum Zeno Effect + Vaidman Grover s Algorthm + Quantum Zeno Effect + Vadman CS 294-2 Bomb 10/12/04 Fall 2004 Lecture 11 Grover s algorthm Recall that Grover s algorthm for searchng over a space of sze wors as follows: consder the

More information

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law: CE304, Sprng 2004 Lecture 4 Introducton to Vapor/Lqud Equlbrum, part 2 Raoult s Law: The smplest model that allows us do VLE calculatons s obtaned when we assume that the vapor phase s an deal gas, and

More information

Chapter Newton s Method

Chapter Newton s Method Chapter 9. Newton s Method After readng ths chapter, you should be able to:. Understand how Newton s method s dfferent from the Golden Secton Search method. Understand how Newton s method works 3. Solve

More information

Appendix B: Resampling Algorithms

Appendix B: Resampling Algorithms 407 Appendx B: Resamplng Algorthms A common problem of all partcle flters s the degeneracy of weghts, whch conssts of the unbounded ncrease of the varance of the mportance weghts ω [ ] of the partcles

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Statistical analysis using matlab. HY 439 Presented by: George Fortetsanakis

Statistical analysis using matlab. HY 439 Presented by: George Fortetsanakis Statstcal analyss usng matlab HY 439 Presented by: George Fortetsanaks Roadmap Probablty dstrbutons Statstcal estmaton Fttng data to probablty dstrbutons Contnuous dstrbutons Contnuous random varable X

More information

Which Separator? Spring 1

Which Separator? Spring 1 Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal

More information

XII.3 The EM (Expectation-Maximization) Algorithm

XII.3 The EM (Expectation-Maximization) Algorithm XII.3 The EM (Expectaton-Maxzaton) Algorth Toshnor Munaata 3/7/06 The EM algorth s a technque to deal wth varous types of ncoplete data or hdden varables. It can be appled to a wde range of learnng probles

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

Clustering & Unsupervised Learning

Clustering & Unsupervised Learning Clusterng & Unsupervsed Learnng Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 2012 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y

More information

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also

More information

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s

More information

Gaussian process classification: a message-passing viewpoint

Gaussian process classification: a message-passing viewpoint Gaussan process classfcaton: a message-passng vewpont Flpe Rodrgues fmpr@de.uc.pt November 014 Abstract The goal of ths short paper s to provde a message-passng vewpont of the Expectaton Propagaton EP

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information

Overview. Hidden Markov Models and Gaussian Mixture Models. Acoustic Modelling. Fundamental Equation of Statistical Speech Recognition

Overview. Hidden Markov Models and Gaussian Mixture Models. Acoustic Modelling. Fundamental Equation of Statistical Speech Recognition Overvew Hdden Marov Models and Gaussan Mxture Models Steve Renals and Peter Bell Automatc Speech Recognton ASR Lectures &5 8/3 January 3 HMMs and GMMs Key models and algorthms for HMM acoustc models Gaussans

More information

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1 On an Extenson of Stochastc Approxmaton EM Algorthm for Incomplete Data Problems Vahd Tadayon Abstract: The Stochastc Approxmaton EM (SAEM algorthm, a varant stochastc approxmaton of EM, s a versatle tool

More information

Pattern Classification

Pattern Classification Pattern Classfcaton All materals n these sldes ere taken from Pattern Classfcaton (nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wley & Sons, 000 th the permsson of the authors and the publsher

More information

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline Outlne Bayesan Networks: Maxmum Lkelhood Estmaton and Tree Structure Learnng Huzhen Yu janey.yu@cs.helsnk.f Dept. Computer Scence, Unv. of Helsnk Probablstc Models, Sprng, 200 Notces: I corrected a number

More information

Maximal Margin Classifier

Maximal Margin Classifier CS81B/Stat41B: Advanced Topcs n Learnng & Decson Makng Mamal Margn Classfer Lecturer: Mchael Jordan Scrbes: Jana van Greunen Corrected verson - /1/004 1 References/Recommended Readng 1.1 Webstes www.kernel-machnes.org

More information

Clustering & (Ken Kreutz-Delgado) UCSD

Clustering & (Ken Kreutz-Delgado) UCSD Clusterng & Unsupervsed Learnng Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y ), fnd an approxmatng

More information

VQ widely used in coding speech, image, and video

VQ widely used in coding speech, image, and video at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng

More information

ECE 534: Elements of Information Theory. Solutions to Midterm Exam (Spring 2006)

ECE 534: Elements of Information Theory. Solutions to Midterm Exam (Spring 2006) ECE 534: Elements of Informaton Theory Solutons to Mdterm Eam (Sprng 6) Problem [ pts.] A dscrete memoryless source has an alphabet of three letters,, =,, 3, wth probabltes.4,.4, and., respectvely. (a)

More information

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise. Chapter - The Smple Lnear Regresson Model The lnear regresson equaton s: where y + = β + β e for =,..., y and are observable varables e s a random error How can an estmaton rule be constructed for the

More information

MARKOV CHAIN AND HIDDEN MARKOV MODEL

MARKOV CHAIN AND HIDDEN MARKOV MODEL MARKOV CHAIN AND HIDDEN MARKOV MODEL JIAN ZHANG JIANZHAN@STAT.PURDUE.EDU Markov chan and hdden Markov mode are probaby the smpest modes whch can be used to mode sequenta data,.e. data sampes whch are not

More information

Composite Hypotheses testing

Composite Hypotheses testing Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter

More information

Classification as a Regression Problem

Classification as a Regression Problem Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class

More information

e i is a random error

e i is a random error Chapter - The Smple Lnear Regresson Model The lnear regresson equaton s: where + β + β e for,..., and are observable varables e s a random error How can an estmaton rule be constructed for the unknown

More information

6 Supplementary Materials

6 Supplementary Materials 6 Supplementar Materals 61 Proof of Theorem 31 Proof Let m Xt z 1:T : l m Xt X,z 1:t Wethenhave mxt z1:t ˆm HX Xt z 1:T mxt z1:t m HX Xt z 1:T + mxt z 1:T HX We consder each of the two terms n equaton

More information

Finding Dense Subgraphs in G(n, 1/2)

Finding Dense Subgraphs in G(n, 1/2) Fndng Dense Subgraphs n Gn, 1/ Atsh Das Sarma 1, Amt Deshpande, and Rav Kannan 1 Georga Insttute of Technology,atsh@cc.gatech.edu Mcrosoft Research-Bangalore,amtdesh,annan@mcrosoft.com Abstract. Fndng

More information

Lecture 20: November 7

Lecture 20: November 7 0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:

More information

Online Classification: Perceptron and Winnow

Online Classification: Perceptron and Winnow E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng

More information

Limited Dependent Variables

Limited Dependent Variables Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages

More information

The Geometry of Logit and Probit

The Geometry of Logit and Probit The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.

More information

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011 Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected

More information

Retrieval Models: Language models

Retrieval Models: Language models CS-590I Informaton Retreval Retreval Models: Language models Luo S Department of Computer Scence Purdue Unversty Introducton to language model Ungram language model Document language model estmaton Maxmum

More information

p 1 c 2 + p 2 c 2 + p 3 c p m c 2

p 1 c 2 + p 2 c 2 + p 3 c p m c 2 Where to put a faclty? Gven locatons p 1,..., p m n R n of m houses, want to choose a locaton c n R n for the fre staton. Want c to be as close as possble to all the house. We know how to measure dstance

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.

More information

Lecture 21: Numerical methods for pricing American type derivatives

Lecture 21: Numerical methods for pricing American type derivatives Lecture 21: Numercal methods for prcng Amercan type dervatves Xaoguang Wang STAT 598W Aprl 10th, 2014 (STAT 598W) Lecture 21 1 / 26 Outlne 1 Fnte Dfference Method Explct Method Penalty Method (STAT 598W)

More information

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there

More information

Hidden Markov Models

Hidden Markov Models CM229S: Machne Learnng for Bonformatcs Lecture 12-05/05/2016 Hdden Markov Models Lecturer: Srram Sankararaman Scrbe: Akshay Dattatray Shnde Edted by: TBD 1 Introducton For a drected graph G we can wrte

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

Hidden Markov Models

Hidden Markov Models Hdden Markov Models Namrata Vaswan, Iowa State Unversty Aprl 24, 204 Hdden Markov Model Defntons and Examples Defntons:. A hdden Markov model (HMM) refers to a set of hdden states X 0, X,..., X t,...,

More information

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30 STATS 306B: Unsupervsed Learnng Sprng 2014 Lecture 10 Aprl 30 Lecturer: Lester Mackey Scrbe: Joey Arthur, Rakesh Achanta 10.1 Factor Analyss 10.1.1 Recap Recall the factor analyss (FA) model for lnear

More information

Expected Value and Variance

Expected Value and Variance MATH 38 Expected Value and Varance Dr. Neal, WKU We now shall dscuss how to fnd the average and standard devaton of a random varable X. Expected Value Defnton. The expected value (or average value, or

More information