1 Motivation and Introduction

Size: px
Start display at page:

Download "1 Motivation and Introduction"

Transcription

1 Instructor: Dr. Volkan Cevher EXPECTATION PROPAGATION September 30, 2008 Rce Unversty STAT 63 / ELEC 633: Graphcal Models Scrbes: Ahmad Beram Andrew Waters Matthew Nokleby Index terms: Approxmate nference, Expectaton propagaton, exponental famly. Motvaton and Introducton In class, we have encountered nference problems that are too complex to solve exactly. As a result, we have looked at methods, such as Laplace s method and varatonal Bayes, that smplfy the problem. In such methods, we smplfy the form of the dstrbutons (by makng them a product of smpler dstrbutons, for example) whle mnmzng some error metrc (such as the KL-dvergence). Here, we focus on the expectaton propagaton (EP) algorthm [, 2]. EP s a determnstc algorthm whch approxmates the true dstrbutons wth exponental-famly dstrbutons. Specfcally, EP apples to probablstc models where the jont dstrbuton has the factorzaton p(x, θ) = f (θ), () where the factors f (θ) are f 0 (θ) = p(θ), (2) f (θ) = p(x θ). (3) where x s the observed data, and θ s the latent varable whose dstrbuton we hope to nfer. So, the data x are condtonally ndependent gven θ. We can compute the posteror accordng to Bayes rule: where the evdence p(x) s p(θ x) = p(x) f (θ), (4) p(x) = f (θ)dθ. (5) Of course, f the factors f (θ) are complcated, computng the posteror may be dffcult. So, we approxmate p(θ x) wth a mathematcally tractable dstrbuton q(θ). In EP, we choose q(θ) to be of the form q(θ) = f (θ), (6) Z where each f (θ) s a member of the exponental famly. We ll revew the exponental famly n the next secton, but for now t s enough to say that the exponental famly s consderably easer to work wth, whch smplfes the Note that the factors f (θ) do ndeed depend on x.

2 nference problem. Each factor f (θ) n the approxmaton corresponds to one factor f (θ). We want to fnd the best exponental-famly approxmaton of p accordng to a meanngful error metrc. For EP, our error metrc s KL(p q). Notce that we do not use KL(q p) as we dd prevously n varatonal Bayes. We prevously dscussed the dfferences between the two. For example, KL(q p) brngs about bad results when the orgnal approxmaton has multple modes. However, even assumng that q(θ) belongs to the exponental famly, mnmzng KL(p q) s not feasble. Instead, EP presents teratons whch, n practce, satsfactorly mnmze KL(p q). But there are no guarantees of optmalty, or even convergence, of the algorthm. The remander of the document s organzed as follows. In Secton 2 we brefly revew the exponental famly. In Secton 3 we detal the steps of the expectaton propagaton algorthm. We conclude wth two examples, borrowed from [], n Secton 4. 2 Exponental Famly In ths secton, we brefly revew the exponental famly. A dstrbuton f (θ) from the exponental famly has the form f (θ) = h(θ)g(η) exp {η T u(θ)}, (7) where η s a vector of hyperparameters, and the functons h(θ), g(η), and u(η) are known. Although t has a smple form, the exponental famly s a broad famly, whch ncludes, for example, the Gaussan dstrbuton. A bvarate Gaussan dstrbuton looks lke p(x µ, Σ) = for x = (x, x 2 ) and Σ = Λ. We can rewrte ths n the form of (7): 2π Σ exp { 2 (x µ)t Σ (x µ)}, (8) p(x) = h(x)g(η) exp { η T u(x) }, (9) where we choose u(x) = x x 2 x x 2 x 2, η = Λ µ + Λ 2 µ 2 Λ 22 µ 2 + Λ 2 µ Λ 2 2 Λ. (0) x Λ 22 We also choose h(x) = and g(η) = { 2π Σ exp 2 Λ µ 2 Λ 2 µ µ 2 } 2 Λ 22µ 2 2, () whch can be rewrtten as g(η) = π 4η 4 η 5 η 2 3 exp {η µ + η 2 µ 2 }, (2) 2

3 where µ = η 2η 3 2η η 5 4η 4 η 5 η3 2, (3) µ 2 = η η 3 2η 2 η 4 4η 4 η 5 η3 2. (4) 3 Implementaton of Expectaton Propagaton In ths secton, we dscuss the mplementaton of the expectaton propagaton. Our factorzaton assumes that p(θ x) = p(x) and we want to approxmate p(θ x) usng q(θ) as gven by f (θ), (5) q(θ) = f (θ). (6) Z Frst, we calculate KL(p q): KL(p q) = p log (q/p)dx = log g(η) η T u(θ) + constant. (7) To mnmze KL(p q), we form the gradent and set t equal to zero: KL η = η log g(η) E p(θ) [u(θ)] = 0. (8) Note that snce q s determned by η, then optmzaton wth respect to q s equvalent to optmzaton wth respect to η. Therefore, we have Snce q follows an exponental densty dstrbuton as gven by (6), we have η log [g(η)] = E p(θ) [u(θ)]. (9) g(η) q(θ)dθ =, h(θ) exp {η T u(θ)}dθ =. (20) Now, we can dfferentate ths equaton wth respect to η to obtan: η g(η) h(θ) exp {η T u(θ)}dθ + g(η) h(θ) exp {η T u(θ)}u(θ)dθ = 0, (2) where we have used Lebnz rule. It s straghtforward to show that f x = (x,..., x n ) and a = (a,..., a n ), then we have: f(x) = exp {x T a}. Takng dervatves, we get f(x) x = a exp {xt a} = af(x). (22) 3

4 f ( x) f ( x) f ( x) 2 f ( x) 2 Fgure : Approxmaton of ndvdual probablty dstrbuton f (x) wth f (x) f f ( x) 2 f f ( x) 2 ( f f ) ~ 2 ( x) Fgure 2: Product of approxmate ndvdual probablty dstrbutons f (x) f 2 (x) fals to make a good approxmaton of the product of the multplcatons f f 2 (x) whle a good approxmaton could be obtaned for the product tself. It s easy to see that the latter term n (2) s E q(θ) [u(θ)]. Therefore, our optmalty condtons reduce to ηg(η) g(η) = E q(θ)[u(θ)]. (23) Thus, the whole problem reduces to E q(θ) [u(θ)] = E p(θ) [u(θ)]. (24) In other words, at each step of the optmzaton, we need to match the moments between p and q. However, solvng ths equaton s ntractable tself snce t requres calculaton of expectaton wth respect to the orgnal probablty dstrbuton functon p. But we have already decded that be ntractable, whch s why we are usng an approxmaton n the frst place. A plausble smplfcaton would be to approxmate each factor n p usng one factor n q. That s, we could optmze by matchng the moments between f (θ) and f (θ). However, by dong so we lmt ourselves to a lmted subset of the feasble regon. In other words, we elmnate many canddate solutons that effectvely mnmze KL(p q), but for whch the ndvdual moments of f do not match those of f. For example, consder Fgure, where f (x) has two modes whle f 2 (x) has only one. By mnmzng the KLdvergence for each of the dstrbutons, f (x) and f 2 (x) could be obtaned, as shown n Fgure. Now, consder f (x)f 2 (x). Snce f 2 (x) 0 for x n the second mode of f (x), f (x)f 2 (x) only has one mode, as shown n Fgure 2 and s well-approxmated by mnmzng the KL-dvergence. However, f (x) f 2 (x) brngs about a dstrbuton whch poorly approxmates of f (x)f 2 (x), as demonstrated n Fgure 2. 4

5 So, we need somethng more sophstcated than matchng moments factor-by-factor. At each step of EP, we pck some j, and nclude all the factors except for f j, and try to approxmate the result. That s, nstead of matchng moments for each factor of p and q, we match moments for the dstrbutons wth f j and f j mssng. Ths way, f we have large number of factors we obtan a much better approxmaton than factor-by-factor matchng, snce the form of the approxmaton s not as restrcted. To see ths, let N be the total number of factors n p. In approxmatng factor-by-factor, we ntroduce N extra constrants to the orgnal problem. In contrast, we only add a sngle extra constrant by omttng one factor at a tme, whch guarantees a much better result. We generate an teratve method to solve the approxmate problem. Now, we separate factor f j from the approxmate dstrbuton form q(θ) = f j (θ) J f (θ). (25) We defne q j as the dstrbuton functon, where factor j s omtted: q j (θ) = q(θ) f j (θ). (26) Now, we defne q as the dstrbuton functon n whch all the factors are from the orgnal dstrbuton except for f j, whch s chosen from the approxmatng famly q (θ) = Z j f j (θ) q(θ) f j (θ) = z j f j (θ)q j (θ). (27) Then, we wll solve ths smpler problem by mnmzng the KL-dvergence between q and q. q new = arg mn q KL(q (θ) q(θ)), (28) whch s easly solved by moment matchng. Then, we can fnd the updated f j from f j (θ) = K qnew (θ) q j (θ) (29) Unfortunately, there are no theoretcal guarantees for matchng-pursut algorthms n general. So, n general, we cannot say anythng about the qualty of the approxmaton produced by EP, and there exst examples where EP fals to satsfactorly mnmze the KL-dvergence. However, for large N, we have seen that EP at least outperforms moment-matchng on ndvdual factors. And, despte the lack of convergence guarantees, EP works well n practce. 4 Addtonal Examples 4. Clutter Problem In the clutter problem dscussed n [], we have Gaussan observatons of d-dmensonal data x whch are corrupted by nose and embedded n unrelated clutter. Ths gves us a Gaussan mxture model: p(y x) = ( w)n (y; x, I) + wn (y; 0, 0I), (30) 5

6 where I s the dentty matrx and N (y; m, V) denotes the multvarate Gaussan dstrbuton over y wth mean m and covarance V. The frst term n (30) s the (nose-corrupted) desred data, and the second term s dffuse Gaussan clutter. We assume that the data has a Gaussan pror: p(x) = N (x, 0, 00I). (3) Presumably, we pck a large varance to make the pror as non-nformatve as possble. If we have n ndependent observatons D = {y,, y n }, the jont dstrbuton s gven by: p(d, x) = p(x) n p(y x) = = n f (x). (32) So, the factor f 0 s the Gaussan pror, and each addtonal f are the mxture-model lkelhood functons. Usng EP, we fnd a Gaussan approxmaton to p(d, x). Specfcally, we wll choose the sphercal Gaussan dstrbuton N (m x, v x I), where terms are uncorrelated and have the same varance. So, we need to fnd the parameters m x and v x that result n the best approxmate p(d, x). To fnd ths approxmaton wth EP, we frst ntalze the approxmate terms f : =0 ( f (x) = s exp ) (x m ) T (x m ), (33) 2v where s, v, and m are the parameters of the dstrbuton. For f 0, we just ntalze to the parameters of the pror, whch s already Gaussan: v 0 = 00, s 0 = (2πv 0 ) d/2, and m 0 = 0. We ntalze the data terms such that f = : v =, m = 0, and s =. Ths gves us the global parameters m x = 0 and v x = 00. After ntalzaton, we terate untl all (m, v, s ) converge wthn some small ɛ > 0 (n [], ɛ = 0 4 ). At each teraton, we perform the followng steps for each n (note that the notaton \ refers to the set wth element removed):. Remove the factor f from the current posteror estmate, gvng: (v x \ ) = vx v (34) m \ x = x + v \ x v (m x m ). (35) 2. Recompute (m x, v x ) from (m \ x, v \ x ) va moment matchng, and compute the normalzaton constant Z = ( w)n (y ; m \ x, (v \ x + )I) + wn (y ; 0, 0I). (36) That s, we compute Z by evaluatng the estmated lkelhood factor at the observaton y. 3. Update f : v = v x (v \ x ) (37) m = m \ x + (v + v x \ )(v x \ ) (m x m x \ ) (38) Z s = (2πv ) d/2 N (m ; m \ x, (v + v x \ )I). (39) Fnally, when teratons termnate, we can use the approxmated factors to compute the normalzng constant needed 6

7 to perform nference: where p(d) = (2πv x ) d/2 exp(b/2)π n =0s, (40) B = mt x m x v x n =0 m T m v. (4) 4.2 Bayes Pont Machne Mnka n [] dscusses the use of EP for the problem of Bayesan pont classfcaton. In ths problem, we assume that we have some pont w that we wsh to classfy nto one of two groups y = ± through the followng rule: y = sgn(w T x). (42) Gven a tranng set D = {(x, y ),..., (x n, y n )} we can wrte the lkelhood for w as: p(d w) = φ(z) = z p(y x, w) = ( y w T ) x φ. (43) ɛ N (z; 0, )dz, (44) where ɛ s an error tolerance parameter. It can be seen that φ(z) becomes a step functon as ɛ 0. Mnka derves the EP algorthm for ths scenaro assumng a multvarate Gaussan posteror on w wth Gaussan pror and exponental form for the fnal factors f (w). He shows that applyng EP to the BPM problem yelds superor results relatve to other approxmaton methods avalable prevously n the lterature. The results are shown n 3 and are plotted n computatonal requrements (n FLOPS) vs classfcaton error probablty. It can be seen that EP not only performs better aganst the other algorthms n terms of classfcaton error, but s also computatonally the most effcent. 7

8 ) f (w) = s exp( 2v (w T x m ) 2 ). Intalze wth v =, m = 0, s = 2) q(w) = N (m w, V w ). Intalze wth pror m w = 0, V w = I. 3) Loop over =, 2,..., n untl convergence: a) Remove f (w) from the posteror: V w \ = V w + (V wx )(V w x ) T v x T Vwx m \ w = m w + (V w \ x )v (x T m w m ) b) Recompute (m w, V w ) va the followng: z = (m\ w )T x x T V\ w x + α = x T V\ w x + N (z ;0,) φ(z ) m w = m \ w + V w \ α x ( V w = V w \ (V w \ x ) c) Update f (w): α (x m w+α ) x T V\ w x + v = xt V\ w x + α (x T m w+α ) xt V\ w x m = x T m\ w + (v + x T V\ w x )α s = φ(z ) ( exp 2 +v x T V\ w x ) x T V\ w x + x T α m w +α 4) Compute: B = m w Vw m w m 2 v p(d) V w /2 exp(b/2) n = s ) (V \ w x ) T Fgure 3: Comparson of BPM usng EP wth other classcal methods. 8

9 References [] T. Mnka, Expectaton propagaton for approxmate Bayesan nference, Proc. 7th Conf. Uncertanty n Artfcal Intellgence, 200. [2] C. M. Bshop, Pattern Recognton and Machne Learnng. Oxford: Sprnger,

Gaussian process classification: a message-passing viewpoint

Gaussian process classification: a message-passing viewpoint Gaussan process classfcaton: a message-passng vewpont Flpe Rodrgues fmpr@de.uc.pt November 014 Abstract The goal of ths short paper s to provde a message-passng vewpont of the Expectaton Propagaton EP

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Hidden Markov Models & The Multivariate Gaussian (10/26/04) CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models

More information

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2) 1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons

More information

Expectation propagation

Expectation propagation Expectaton propagaton Lloyd Ellott May 17, 2011 Suppose p(x) s a pdf and we have a factorzaton p(x) = 1 Z n f (x). (1) =1 Expectaton propagaton s an nference algorthm desgned to approxmate the factors

More information

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Gaussian Mixture Models

Gaussian Mixture Models Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous

More information

Maximum Likelihood Estimation (MLE)

Maximum Likelihood Estimation (MLE) Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

Conjugacy and the Exponential Family

Conjugacy and the Exponential Family CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

Relevance Vector Machines Explained

Relevance Vector Machines Explained October 19, 2010 Relevance Vector Machnes Explaned Trstan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introducton Ths document has been wrtten n an attempt to make Tppng s [1] Relevance Vector Machnes

More information

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010 Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Global Gaussian approximations in latent Gaussian models

Global Gaussian approximations in latent Gaussian models Global Gaussan approxmatons n latent Gaussan models Botond Cseke Aprl 9, 2010 Abstract A revew of global approxmaton methods n latent Gaussan models. 1 Latent Gaussan models In ths secton we ntroduce notaton

More information

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors Stat60: Bayesan Modelng and Inference Lecture Date: February, 00 Reference Prors Lecturer: Mchael I. Jordan Scrbe: Steven Troxler and Wayne Lee In ths lecture, we assume that θ R; n hgher-dmensons, reference

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14 APPROXIMAE PRICES OF BASKE AND ASIAN OPIONS DUPON OLIVIER Prema 14 Contents Introducton 1 1. Framewor 1 1.1. Baset optons 1.. Asan optons. Computng the prce 3. Lower bound 3.1. Closed formula for the prce

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Lecture 4: Universal Hash Functions/Streaming Cont d

Lecture 4: Universal Hash Functions/Streaming Cont d CSE 5: Desgn and Analyss of Algorthms I Sprng 06 Lecture 4: Unversal Hash Functons/Streamng Cont d Lecturer: Shayan Oves Gharan Aprl 6th Scrbe: Jacob Schreber Dsclamer: These notes have not been subjected

More information

Singular Value Decomposition: Theory and Applications

Singular Value Decomposition: Theory and Applications Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real

More information

Chapter Newton s Method

Chapter Newton s Method Chapter 9. Newton s Method After readng ths chapter, you should be able to:. Understand how Newton s method s dfferent from the Golden Secton Search method. Understand how Newton s method works 3. Solve

More information

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30 STATS 306B: Unsupervsed Learnng Sprng 2014 Lecture 10 Aprl 30 Lecturer: Lester Mackey Scrbe: Joey Arthur, Rakesh Achanta 10.1 Factor Analyss 10.1.1 Recap Recall the factor analyss (FA) model for lnear

More information

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before

More information

Lecture 12: Classification

Lecture 12: Classification Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna

More information

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF 10-708: Probablstc Graphcal Models 10-708, Sprng 2014 8 : Learnng n Fully Observed Markov Networks Lecturer: Erc P. Xng Scrbes: Meng Song, L Zhou 1 Why We Need to Learn Undrected Graphcal Models In the

More information

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve

More information

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS) Some Comments on Acceleratng Convergence of Iteratve Sequences Usng Drect Inverson of the Iteratve Subspace (DIIS) C. Davd Sherrll School of Chemstry and Bochemstry Georga Insttute of Technology May 1998

More information

Differentiating Gaussian Processes

Differentiating Gaussian Processes Dfferentatng Gaussan Processes Andrew McHutchon Aprl 17, 013 1 Frst Order Dervatve of the Posteror Mean The posteror mean of a GP s gven by, f = x, X KX, X 1 y x, X α 1 Only the x, X term depends on the

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017 U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that

More information

A quantum-statistical-mechanical extension of Gaussian mixture model

A quantum-statistical-mechanical extension of Gaussian mixture model A quantum-statstcal-mechancal extenson of Gaussan mxture model Kazuyuk Tanaka, and Koj Tsuda 2 Graduate School of Informaton Scences, Tohoku Unversty, 6-3-09 Aramak-aza-aoba, Aoba-ku, Senda 980-8579, Japan

More information

Hidden Markov Models

Hidden Markov Models Hdden Markov Models Namrata Vaswan, Iowa State Unversty Aprl 24, 204 Hdden Markov Model Defntons and Examples Defntons:. A hdden Markov model (HMM) refers to a set of hdden states X 0, X,..., X t,...,

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one) Why Bayesan? 3. Bayes and Normal Models Alex M. Martnez alex@ece.osu.edu Handouts Handoutsfor forece ECE874 874Sp Sp007 If all our research (n PR was to dsappear and you could only save one theory, whch

More information

Bayesian predictive Configural Frequency Analysis

Bayesian predictive Configural Frequency Analysis Psychologcal Test and Assessment Modelng, Volume 54, 2012 (3), 285-292 Bayesan predctve Confgural Frequency Analyss Eduardo Gutérrez-Peña 1 Abstract Confgural Frequency Analyss s a method for cell-wse

More information

Lecture 3. Ax x i a i. i i

Lecture 3. Ax x i a i. i i 18.409 The Behavor of Algorthms n Practce 2/14/2 Lecturer: Dan Spelman Lecture 3 Scrbe: Arvnd Sankar 1 Largest sngular value In order to bound the condton number, we need an upper bound on the largest

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton Memoral Unversty of ewfoundland Pattern Recognton Lecture 7 May 3, 006 http://wwwengrmunca/~charlesr Offce Hours: uesdays hursdays 8:30-9:30

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

SIO 224. m(r) =(ρ(r),k s (r),µ(r))

SIO 224. m(r) =(ρ(r),k s (r),µ(r)) SIO 224 1. A bref look at resoluton analyss Here s some background for the Masters and Gubbns resoluton paper. Global Earth models are usually found teratvely by assumng a startng model and fndng small

More information

EM and Structure Learning

EM and Structure Learning EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder

More information

Why BP Works STAT 232B

Why BP Works STAT 232B Why BP Works STAT 232B Free Energes Helmholz & Gbbs Free Energes 1 Dstance between Probablstc Models - K-L dvergence b{ KL b{ p{ = b{ ln { } p{ Here, p{ s the eact ont prob. b{ s the appromaton, called

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

MMA and GCMMA two methods for nonlinear optimization

MMA and GCMMA two methods for nonlinear optimization MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons

More information

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran

More information

Introduction to Hidden Markov Models

Introduction to Hidden Markov Models Introducton to Hdden Markov Models Alperen Degrmenc Ths document contans dervatons and algorthms for mplementng Hdden Markov Models. The content presented here s a collecton of my notes and personal nsghts

More information

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ CSE 455/555 Sprng 2013 Homework 7: Parametrc Technques Jason J. Corso Computer Scence and Engneerng SUY at Buffalo jcorso@buffalo.edu Solutons by Yngbo Zhou Ths assgnment does not need to be submtted and

More information

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering / Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maxmum Lkelhood Estmaton INFO-2301: Quanttatve Reasonng 2 Mchael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 1 of 9 Why MLE?

More information

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z ) C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z

More information

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression 11 MACHINE APPLIED MACHINE LEARNING LEARNING MACHINE LEARNING Gaussan Mture Regresson 22 MACHINE APPLIED MACHINE LEARNING LEARNING Bref summary of last week s lecture 33 MACHINE APPLIED MACHINE LEARNING

More information

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach A Bayes Algorthm for the Multtask Pattern Recognton Problem Drect Approach Edward Puchala Wroclaw Unversty of Technology, Char of Systems and Computer etworks, Wybrzeze Wyspanskego 7, 50-370 Wroclaw, Poland

More information

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1 On an Extenson of Stochastc Approxmaton EM Algorthm for Incomplete Data Problems Vahd Tadayon Abstract: The Stochastc Approxmaton EM (SAEM algorthm, a varant stochastc approxmaton of EM, s a versatle tool

More information

Natural Images, Gaussian Mixtures and Dead Leaves Supplementary Material

Natural Images, Gaussian Mixtures and Dead Leaves Supplementary Material Natural Images, Gaussan Mxtures and Dead Leaves Supplementary Materal Danel Zoran Interdscplnary Center for Neural Computaton Hebrew Unversty of Jerusalem Israel http://www.cs.huj.ac.l/ danez Yar Wess

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

Clustering & Unsupervised Learning

Clustering & Unsupervised Learning Clusterng & Unsupervsed Learnng Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 2012 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y

More information

1 Matrix representations of canonical matrices

1 Matrix representations of canonical matrices 1 Matrx representatons of canoncal matrces 2-d rotaton around the orgn: ( ) cos θ sn θ R 0 = sn θ cos θ 3-d rotaton around the x-axs: R x = 1 0 0 0 cos θ sn θ 0 sn θ cos θ 3-d rotaton around the y-axs:

More information

CS : Algorithms and Uncertainty Lecture 14 Date: October 17, 2016

CS : Algorithms and Uncertainty Lecture 14 Date: October 17, 2016 CS 294-128: Algorthms and Uncertanty Lecture 14 Date: October 17, 2016 Instructor: Nkhl Bansal Scrbe: Antares Chen 1 Introducton In ths lecture, we revew results regardng follow the regularzed leader (FTRL.

More information

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there

More information

14 Lagrange Multipliers

14 Lagrange Multipliers Lagrange Multplers 14 Lagrange Multplers The Method of Lagrange Multplers s a powerful technque for constraned optmzaton. Whle t has applcatons far beyond machne learnng t was orgnally developed to solve

More information

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011 Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected

More information

Hidden Markov Models

Hidden Markov Models CM229S: Machne Learnng for Bonformatcs Lecture 12-05/05/2016 Hdden Markov Models Lecturer: Srram Sankararaman Scrbe: Akshay Dattatray Shnde Edted by: TBD 1 Introducton For a drected graph G we can wrte

More information

3.1 ML and Empirical Distribution

3.1 ML and Empirical Distribution 67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum

More information

Inexact Newton Methods for Inverse Eigenvalue Problems

Inexact Newton Methods for Inverse Eigenvalue Problems Inexact Newton Methods for Inverse Egenvalue Problems Zheng-jan Ba Abstract In ths paper, we survey some of the latest development n usng nexact Newton-lke methods for solvng nverse egenvalue problems.

More information

The optimal delay of the second test is therefore approximately 210 hours earlier than =2.

The optimal delay of the second test is therefore approximately 210 hours earlier than =2. THE IEC 61508 FORMULAS 223 The optmal delay of the second test s therefore approxmately 210 hours earler than =2. 8.4 The IEC 61508 Formulas IEC 61508-6 provdes approxmaton formulas for the PF for smple

More information

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING MACHINE LEANING Vasant Honavar Bonformatcs and Computatonal Bology rogram Center for Computatonal Intellgence, Learnng, & Dscovery Iowa State Unversty honavar@cs.astate.edu www.cs.astate.edu/~honavar/

More information

Gaussian Conditional Random Field Network for Semantic Segmentation - Supplementary Material

Gaussian Conditional Random Field Network for Semantic Segmentation - Supplementary Material Gaussan Condtonal Random Feld Networ for Semantc Segmentaton - Supplementary Materal Ravtea Vemulapall, Oncel Tuzel *, Mng-Yu Lu *, and Rama Chellappa Center for Automaton Research, UMIACS, Unversty of

More information

Learning Theory: Lecture Notes

Learning Theory: Lecture Notes Learnng Theory: Lecture Notes Lecturer: Kamalka Chaudhur Scrbe: Qush Wang October 27, 2012 1 The Agnostc PAC Model Recall that one of the constrants of the PAC model s that the data dstrbuton has to be

More information

Neural networks. Nuno Vasconcelos ECE Department, UCSD

Neural networks. Nuno Vasconcelos ECE Department, UCSD Neural networs Nuno Vasconcelos ECE Department, UCSD Classfcaton a classfcaton problem has two types of varables e.g. X - vector of observatons (features) n the world Y - state (class) of the world x X

More information

First Year Examination Department of Statistics, University of Florida

First Year Examination Department of Statistics, University of Florida Frst Year Examnaton Department of Statstcs, Unversty of Florda May 7, 010, 8:00 am - 1:00 noon Instructons: 1. You have four hours to answer questons n ths examnaton.. You must show your work to receve

More information

Lecture 21: Numerical methods for pricing American type derivatives

Lecture 21: Numerical methods for pricing American type derivatives Lecture 21: Numercal methods for prcng Amercan type dervatves Xaoguang Wang STAT 598W Aprl 10th, 2014 (STAT 598W) Lecture 21 1 / 26 Outlne 1 Fnte Dfference Method Explct Method Penalty Method (STAT 598W)

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

Solving Nonlinear Differential Equations by a Neural Network Method

Solving Nonlinear Differential Equations by a Neural Network Method Solvng Nonlnear Dfferental Equatons by a Neural Network Method Luce P. Aarts and Peter Van der Veer Delft Unversty of Technology, Faculty of Cvlengneerng and Geoscences, Secton of Cvlengneerng Informatcs,

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information

The Basic Idea of EM

The Basic Idea of EM The Basc Idea of EM Janxn Wu LAMDA Group Natonal Key Lab for Novel Software Technology Nanjng Unversty, Chna wujx2001@gmal.com June 7, 2017 Contents 1 Introducton 1 2 GMM: A workng example 2 2.1 Gaussan

More information

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models Computaton of Hgher Order Moments from Two Multnomal Overdsperson Lkelhood Models BY J. T. NEWCOMER, N. K. NEERCHAL Department of Mathematcs and Statstcs, Unversty of Maryland, Baltmore County, Baltmore,

More information

Course 395: Machine Learning - Lectures

Course 395: Machine Learning - Lectures Course 395: Machne Learnng - Lectures Lecture 1-2: Concept Learnng (M. Pantc Lecture 3-4: Decson Trees & CC Intro (M. Pantc Lecture 5-6: Artfcal Neural Networks (S.Zaferou Lecture 7-8: Instance ased Learnng

More information

The conjugate prior to a Bernoulli is. A) Bernoulli B) Gaussian C) Beta D) none of the above

The conjugate prior to a Bernoulli is. A) Bernoulli B) Gaussian C) Beta D) none of the above The conjugate pror to a Bernoull s A) Bernoull B) Gaussan C) Beta D) none of the above The conjugate pror to a Gaussan s A) Bernoull B) Gaussan C) Beta D) none of the above MAP estmates A) argmax θ p(θ

More information

Semi-Supervised Learning

Semi-Supervised Learning Sem-Supervsed Learnng Consder the problem of Prepostonal Phrase Attachment. Buy car wth money ; buy car wth wheel There are several ways to generate features. Gven the lmted representaton, we can assume

More information

9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov

9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov 9.93 Class IV Part I Bayesan Decson Theory Yur Ivanov TOC Roadmap to Machne Learnng Bayesan Decson Makng Mnmum Error Rate Decsons Mnmum Rsk Decsons Mnmax Crteron Operatng Characterstcs Notaton x - scalar

More information

Lecture 4: September 12

Lecture 4: September 12 36-755: Advanced Statstcal Theory Fall 016 Lecture 4: September 1 Lecturer: Alessandro Rnaldo Scrbe: Xao Hu Ta Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer: These notes have not been

More information

The EM Algorithm (Dempster, Laird, Rubin 1977) The missing data or incomplete data setting: ODL(φ;Y ) = [Y;φ] = [Y X,φ][X φ] = X

The EM Algorithm (Dempster, Laird, Rubin 1977) The missing data or incomplete data setting: ODL(φ;Y ) = [Y;φ] = [Y X,φ][X φ] = X The EM Algorthm (Dempster, Lard, Rubn 1977 The mssng data or ncomplete data settng: An Observed Data Lkelhood (ODL that s a mxture or ntegral of Complete Data Lkelhoods (CDL. (1a ODL(;Y = [Y;] = [Y,][

More information

4DVAR, according to the name, is a four-dimensional variational method.

4DVAR, according to the name, is a four-dimensional variational method. 4D-Varatonal Data Assmlaton (4D-Var) 4DVAR, accordng to the name, s a four-dmensonal varatonal method. 4D-Var s actually a drect generalzaton of 3D-Var to handle observatons that are dstrbuted n tme. The

More information

Stat 543 Exam 2 Spring 2016

Stat 543 Exam 2 Spring 2016 Stat 543 Exam 2 Sprng 206 I have nether gven nor receved unauthorzed assstance on ths exam. Name Sgned Date Name Prnted Ths Exam conssts of questons. Do at least 0 of the parts of the man exam. I wll score

More information

Lecture 2: Prelude to the big shrink

Lecture 2: Prelude to the big shrink Lecture 2: Prelude to the bg shrnk Last tme A slght detour wth vsualzaton tools (hey, t was the frst day... why not start out wth somethng pretty to look at?) Then, we consdered a smple 120a-style regresson

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information