A tutorial on variational Bayesian inference. Charles W. Fox & Stephen J. Roberts

Size: px
Start display at page:

Download "A tutorial on variational Bayesian inference. Charles W. Fox & Stephen J. Roberts"

Transcription

1 A tutoral on varatonal Bayesan nference Charles W. Fox & Stephen J. Roberts Artfcal Intellgence Revew An Internatonal Scence and Engneerng Journal ISSN Artf Intell Rev DOI / s

2 Your artcle s protected by copyrght and all rghts are held exclusvely by Sprnger Scence+Busness Meda B.V.. Ths e-offprnt s for personal use only and shall not be selfarchved n electronc repostores. If you wsh to self-archve your work, please use the accepted author s verson for postng to your own webste or your nsttuton s repostory. You may further depost the accepted author s verson on a funder s repostory at a funder s request, provded t s not made publcly avalable untl 12 months after publcaton. 1 23

3 Artf Intell Rev DOI /s A tutoral on varatonal Bayesan nference Charles W. Fox Stephen J. Roberts Sprnger Scence+Busness Meda B.V Abstract Ths tutoral descrbes the mean-feld varatonal Bayesan approxmaton to nference n graphcal models, usng modern machne learnng termnology rather than statstcal physcs concepts. It begns by seekng to fnd an approxmate mean-feld dstrbuton close to the target jont n the KL-dvergence sense. It then derves local node updates and revews the recent Varatonal Message Passng framework. Keywords Varatonal Bayes Mean-feld Tutoral 1 Introducton Varatonal methods have recently become popular n the context of nference problems, (Attas 2000; Wnn and Bshop 2005). Varatonal Bayes s a partcular varatonal method whch ams to fnd some approxmate jont dstrbuton Q(x; θ) over hdden varables x to approxmate the true jont P(x), and defnes closeness as the KL dvergence KL[Q(x; θ) P(x)]. The mean-feld form of VB assumes that Q factorses nto sngle-varable factors, Q(x) = Q (x θ ). The asymetrc KL[Q P] s chosen n VB prncpally to yeld useful computatonal smplfcatons, but can be vewed as preferrng approxmaton where areas of hgh Q are accurate, rather than areas of hgh P. Ths s often useful because f we were to draw samples from, or ntegrate over, Q, then the areas used wll be largely accurate (though they may well mss out areas of hgh P). C. W. Fox (B) Adaptve Behavour Research Group, Unversty of Sheffeld, Sheffeld, UK e-mal: charles.fox@sheffeld.ac.uk S. J. Roberts Pattern Analyss and Machne Learnng Research Group, Department of Engneerng Scence, Unversty of Oxford, Oxford, UK

4 C. W. Fox, S. J. Roberts 1.1 Problem statement We wsh to fnd a set of dstrbutons {Q (x ; θ )} to mnmse the KL-dvergence: KL[Q(x) P(x D)] = dx.q(x) ln Q(x) P(x D) where D s the observed data and x are the unobserved varables and Q(x) = Q (x θ ) We sometmes omt the notatonal dependence of Q on θ for clarty. As the Q are approxmate belefs, they are subject to the normalsaton constrants:. dx Q (x ) = VB approxmates jonts, not margnals Q approxmates the jont, but the ndvdual Q (x ) can be poor approxmatons to the true margnals P (x ). The Q (x ) components should not be expected to resemble even remotely the true margnals, for example when nspectng the computatonal states of network nodes. Ths makes VB consderably harder to debug than algorthms whose node states do have some local nterpretaton: the optmal VB components can be hghly counterntutve and make sense only n the global context. Fgure 1a shows a graphcal model of an extreme example of mean-feld VB breakng down, as a warnng for the algorthms that follow. Suppose a factory produces screws of unknown dameter μ. We know the machnes are very accurate so the precson γ = 1/σ 2 of the dameters s hgh. However no-one has told us what the mean dameter s, except for a drunken engneer n a pub who sad t was 5 mm. We mght have a pror belef π that μ s say 5 ± 4 mm (the 4 mm devaton reflectng our low confdence n the report.) Suppose we are gven a sealed box contanng one screw. What s our belef n ts dameter x? The exact margnal belef n x should be almost dentcal to our belef n μ,.e. 5 ± 4 mm. Fgure 1b shows one standard devaton of the true Gaussan jont P(μ, x) and the best mean-feld Gaussan approxmate jont Q(μ, x).as dscussed above,ths Q s chosen to mnmse error n ts own doman so t appears as a tght Gaussan. Importantly, the margnal Q x (x) s now very small, and not at all equal to the margnal P(x). We emphasse such cases because we found them to be the major cause of debuggng tme durng development. We also want to fnd the model lkelhood, P(D M) for model comparson. Ths quantty wll appear naturally as we try to mnmse the KL dvergence. There are many ways to thnk about the followng dervatons, whch we wll see nvolves a balance of three terms called lower bound, energy and entropy. The dervaton presented here begns by amng to mnmse the above KL dvergence. For example, other equvalent dervatons may begn by amng to maxmse the lower bound.

5 A tutoral on varatonal Bayesan nference Fg. 1 a Graphcal model for a populaton mean problem. Square nodes ndcate observed varables. b True jont P and VB approxmaton Q (a) (b) 1.3 Rewrtng KL optmsaton as an easer problem We wll rewrte the KL equaton n terms that are more tractable. Frst we flp the numerator and denomnator, and flp the sgn: KL[Q(x) P(x D)] = dx.q(x) ln Q(x) P(x D) = dx.q(x) ln P(x D) Q(x) Next, replace the condtonal P(x D) wth a jont P(x, D) and a pror P(D). Thereason for makng ths rewrte s that for Bayesan networks wth exponental famly nodes, the log P(x, D) term wll be a be a very smple sum of node energy terms, whereas log P(x D) s more complcated. Ths wll smplfy later computatons. P(x, D) = P(x D) P(D) = ln P(x D) = ln P(x, D) ln P(D) ( ) P(x, D) KL[Q(x) P(x D)] = dx Q(x) ln ln P(D) Q(x) The log P(D) term does not nvolve Q so we can gnore t for the purposes of our mnmsaton. Fnally, defne So that L[Q(x)] = ( dx Q(x) ln ) P(x, D) Q(x) KL[Q(x) P(x D)] = L + ln P(D) So to mnmze the KL dvergence, we must maxmze L. The maxmsaton s stll subject to the normalsaton constrants:. dx Q (x ) = 1

6 C. W. Fox, S. J. Roberts L[Q(x)] are lower bounds on the model log-lkelhood, P(D) = P(D M) (where we generally drop the M notaton as we are workng wth a sngle model only). The best bound s thus acheved when L[Q(x)] s maxmsed over Q. The reason for L beng a lower bound s seen by rearrangng as: ln P(D) = L[Q(x)]+ KL[Q(x D) P(x)] Thus when the KL-dvergence s zero (a perfect ft), L s equal to the model log-lkelhood. When the ft s not perfect, the KL-dvergence s always postve and so L[Q(x)] < ln P(D). Another rearrangement gves L[Q(x)] = ln P(D) KL[Q(x D) P(x)] showng that the KL dvergence s the error between L and ln P(D). 1.4 Soluton of free energy optmsaton WenowwshtofndQ to maxmse the lower bound, subject to normalsaton constrants, P(x, D) L[Q(x)] = dxq(x) ln Q(x) = dxq(x) log P(x, D) dxq(x) ln Q(x) = E(x, D) Q(x) + H[Q(x)] where we defne energy as E = ln P and entropy 1 as H[Q(x)] = dxq(x) ln Q(x).For exponental models, ths wll become a convenent sum of lnear functons. By the mean feld assumpton: ( ) ( ) L[Q(x)] = dx Q (x ) E(x, D) dx Q k (x k ) ln Q (x ) (1) Consder the entropy (rghtmost) term. We can brng out the sum: ( ) dx Q k (x k ) ln Q (x ) Consder the parttons x ={x, x } where x = x\x. = dx d x Q ( x )Q (x ) ln Q (x ) = dx Q (x ) ln Q (x ) Q( x ) = dx Q (x ) ln Q (x ) k k Substtutng ths nto the rght term of Eq. 1: ( ) L[Q(x)] = dx Q(x ) E(x, D) 1 Ths s Shannon entropy, used by conventon. dx.q(x ) ln Q(x ) (2)

7 A tutoral on varatonal Bayesan nference Now look at and rearrange the energy (left) term of Eq. 2, agan separatng out one varable: ( ) dx Q (x ) E(x, D) = dx Q (x ) d x.q( x )E(x, D) = dx Q (x ) E(x, D) Q( x ) = dx Q (x ) ln exp E(x, D) Q( x ) = dx Q (x ) ln Q (x ) + ln Z wherewehavedefnedq (x ) = Z 1 exp E(x, D) Q( x ) and Z normalses Q (x ). Substtutng ths new form of the energy back nto Eq. 2 yelds L[Q(x)] = dx Q (x ) ln Q (x ) dx Q (x ) ln Q (x ) + ln Z Separate out the entropy for H = H[Q (x )] from the rest of the entropy sum: { } L[Q(x)] = dx Q (x ) ln Q (x ) dx Q (x ) ln Q (x ) + H[(Q( x )]+ln Z Consder the terms n the brackets: dx Q (x ) ln Q (x ) dx Q (x ) ln Q (x ) = dx Q (x ) ln Q (x ) Q (x ) = KL[Q (x ) Q (x )] What a lucky co-ncdence! Though we started by tryng to mnmse the KL-dvergence between large jont dstrbutons (whch s hard), we have converted the problem to that of mnmsng KL-dvergences between ndvdual 1D dstrbutons (whch s easer). Wrte: L[Q(x)] = KL[Q (x ) Q (x )]+H[Q ( x )]+ln Z Thus L depends on each ndvdual Q only through the KL term. We wsh to maxmse L wth respect to each Q, subject to the constrant that all Q are normalzed to unty. Ths could be acheved by Lagrange multplers and functonal dfferentaton: δ δq (x ) KL[Q (x ) Q (x )] λ Q (x )dx 1 := 0 s A long algebrac dervaton would then eventually lead to a Gbbs dstrbuton. However, thanks to the KL form rearrangement we do not need to perform any of ths, because we can see mmedately that L wll be maxmsed when the KL dvergence s zero, hence when Q(x ) = Q (x ) (The normalsaton constrant on Q s satsfed thanks to the ncluson of Z n the prevous defnton of Q ). Expandng back the defnton gves the optmal Q to be Q(x ) = 1 Z exp E(x, x, D) Q( x ) where E(x, x, D) = log P(x, x, D) s the energy.

8 C. W. Fox, S. J. Roberts Fg. 2 Graphcal model for varatonal example. Square nodes are observed 1.5 Soluton as teratve update equatons Convertng the equaton above nto an teratve update equaton gves: Q(x ) 1 Z exp E(x, x, D) Q( x ) where x j s a hdden node to be updated; D are the observed data, and x = (x\x ) are the other hdden nodes. These updates gve us an EM-lke algorthm, optmsng one node at a tme n the hope of convergng to a local mnmum. Graphcal models defne condtonal ndependence relatonshps between nodes, so for models havng such structure there exsts a Markov blanket set of nodes mb(x ) for each node x such that P(x x ) = P(x mb(x )). For undrected graphcal model lnks, the Markov blanket s the net of neghbourng nodes of x ; for drected Bayesan networks t s the neghbours of x augmented wth the set of co-parents of x. By the defnton of Markov blankets, we may wrte ln Q(x ) ln P(x, mb(x )), D Q(mb(x )) where mb(x ) s the Markov blanket of the node x of nterest. 1.6 Example: varatonal Bayesan mean update Untl recently, varatonal mplementatons have conssted of calculatng the teratve update equatons by hand for each specfc network model. These calculatons requred much algebra and were error-prone. We consder the smple graphcal model shown n Fg. 2: the task s to nfer the mean μ component of the posteror jont of a Gaussan populaton of unknown precson γ gven a set of M observatons D ={D } =1 M drawn from ths populaton. Makng the mean-feld approxmaton and assumng conjugate exponental prors we have: Q(x) = Q(μ)Q(γ ) = N(μ; m,β 1 )Γ (γ ; a, b) where m,β,a and b are constants specfyng conjugate prors on the populaton parameters. The update equaton s ln Q(x ) ln P(x, mb(x )), D Q(mb(x ))

9 A tutoral on varatonal Bayesan nference Substtutng the varables for the mean update, x = μ: ln Q(μ) dγγ(γ; a, b) ln P(μ, γ, {D }) = dγγ(γ; a, b) ln N(μ; m,β 1 )Γ (γ ; a, b) N(D μ, γ ) { = dγγ(γ; a, b) ln N(μ; m,β 1 ) + ln Γ(γ; a, b) + } ln N(D μ, γ ) = dγγ(γ; a, b) ln N(μ; m,β 1 ) + dγγ(γ; a, b) ln Γ(γ; a, b)] + dγγ(γ; a, b) ln N(D μ, γ ) Ths smplfes greatly. Frst, ntegrands not dependent on μ can be dscarded and replaced by a normalsng Z, snce we are only nterested n the PDF for Q(μ): ln Q(μ) = dγγ(γ; a, b) ln N(μ; m,β 1 ) + dγγ(γ; a, b) ln N(D μ, γ ) + ln 1 Z The left term does depend on μ but not on γ, so the ntegral over the normalsed Q(γ ) has no effect. Its ntegral s smply ts ntegrand, so we can wrte ln Q(μ) = ln N(μ; m,β 1 ) + dγγ(γ; a, b) ln N(D μ, γ ) + ln 1 Z We next consder the ntegral contanng data D. Wrtng out ts log Gaussan energy expresson n full, ths term becomes dγγ(γ; a, b) ln N(D μ, γ ) = dγγ(γ; a, b)[c + (D μ)γ (D μ)] where c s ts normalsng constant. Rewrtng up to normalsaton gves [ (D μ) ] dγγ(γ; a, b)γ (D μ) The requred ntegral s now just the expectaton (frst moment) of a Gamma dstrbuton, whch can be found n standard statstcs tables: γ =E[Γ(γ; a, b)] =ab 1 (We notate pre- and post-multplcatons by (D μ) rather than a sngle multplcaton by (D μ) 2 so that the multvarate Wshart case follows the same dervaton and notaton.) The term s now (D μ) γ (D μ)

10 C. W. Fox, S. J. Roberts Substtutng ths back nto the equaton for ln Q(μ) gves ln Q(μ) = ln N(μ; m,β 1 ) + (D μ) γ (D μ) + ln 1 Z { } Q(μ) = 1 Z N(μ; m,β 1 ) exp (D μ) γ (D μ) Q(μ) = 1 Z N(μ; m,β 1 ) N(D μ, γ ) Swtchng round the dependent varable we obtan the followng. (For μ n Gaussan dstrbuton, the flpped result s stll Gaussan. For other dstrbutons, t wll be conjugate.) Q(μ) = 1 Z N(μ; m,β 1 ) N(μ D, γ ) Fnally we can use the standard equaton for product of Gaussans to gve wth Q(μ) = N(μ m,β 1 ) β = β + M γ m = β 1 (βm + γ ) M x The above llustrates how the general VB updates can be transformed nto partcular updates for graphcal models. The Q(μ) component s meanngless by tself as VB ams to approxmate the full jont rather than local margnals, so a more useful analyss would repeat the above to obtan the Q(γ ) updates as well requrng even more algebra. We see that ths process s tme-consumng and error prone even for very smple networks such as the Gaussan populaton used here. 1.7 Varatonal Bayes wth message passng The prevous method of by-hand dervaton was tme-consumng and tedous, though untl recently was state-of-the-art. However the recent varatonal message passng (VMP) algorthm (Bshop et al. 2002; Wnn and Bshop 2005) has shown how to automate these dervatons n the case of conjugate-exponental networks. For such networks, the updates all have a standard form, nvolvng the suffcent statstcs and natural parameters of the partcular node types. For unavodable non-conjugate-exponental nodes (such as mxture models n partcular) t s possble to make further approxmatons to brng them nto the standard form. For conjugate-exponental networks, VMP should now be the standard method for varatonal Bayesan nference, replacng dervatons by hand. For non-conjugate-exponental networks, VMP may stll be useful f fast approxmatons are requred at the expense of accuracy. =1

11 A tutoral on varatonal Bayesan nference 1.8 VMP detals A standard theorem (Bernardo and Smth 2000) about exponental famly dstrbutons shows that the expectaton of suffcent statstcs are gven smply by: u(x ) = φ g(φ) φ where the Dell symbol ( ) means we form a vector whose rth component s the dervatve of g wth respect to the rth component of the vector φ. We wll wrte x s set of parents as pa(x ); ts set of chldren as ch(x ); ts set of co-parents wth respect to partcular chld ch as cop(x ; ch); and ts set of co-parents over all chldren as cop(x ). We wsh to compute the varatonal Bayesan update as n the prevous secton: Q (x ) ln P(x, mb(x )), D Q(mb(x )) Assumng the effects of D are already n the Markov blanket nodes, and separatng, ths becomes ln P(pa(x )) + ln P(cop(x )) + ln P(x pa(x )) + ln P(ch(x ) x, cop(x )) Q(mb(x )) Smplfyng and droppng constant parent and co-parent terms: = ln P(x pa(x )) Q(pa(x )) + ln P(ch(x ) x, cop(x )) Q(ch(x ),cop(x )) The chldren separate: = ln P(x pa(x )) Q(pa(x )) + ch ch(x ) We wll consder the two parts one at a tme. ln P(ch x, cop(x ; ch)) Q(ch,cop(x ;ch)) Messages from parents A conjugate-exponental node x s parametrsed by a natural parameter vector φ.bythe defnton of such nodes, ln P(x pa(x )) Q(pa(x )) = φ u(x ) + f (x ) + g (φ ) Q(pa(x )) = φ Q(pa(x ))u (x ) + f (x ) + g(φ ) Q(pa(x )) As φ and g are mult-lnear functons of the parent suffcent statstcs (by constructon), and usng the mean feld assumpton, we can smply take ther formulæ (defned as condtonal on sngle values of the parents) and substtute the expectatons for the suffcent statstcs, to get the expectaton of the whole expresson as requred. So parents of x need only to send ther suffcent statstc expectatons to x as messages. (A concrete example s shown n Sect. 1.9.) Messages to parents A key property of the exponental famly s that we can multply (fuse) smlar dstrbutons by addng ther natural parameter vectors φ: exp{φ 1 u(x ) + f (x ) + g(φ 1 )} exp{φ 1 u(x ) + f (x ) + g(φ 1 )} = exp{(φ 1 + φ 2 )u(x ) + f (x ) + g(φ 1 + φ 2 )}

12 C. W. Fox, S. J. Roberts A second property s that by conjugacy, φ and g are also mult-lnear n parental suffcent statstcs. So we can always rearrange the formula by fndng functons φ j, f j, g j to make t look lke a functon of a parent x j pa(x ): ln P(x pa(x )) Q(pa(x )) = φ j u j (x j ) + f j (x j ) + g j (φ j ) Q(pa(x )) As before, we may handle the expectaton by usng the mult-lnear property to push all the expectatons tght around the suffcent statstcs. So from the pont of vew of the parent, ths s wrtten n terms of the suffcent statstc expectatons of ts chld and co-parents. We can thus pass a lkelhood message consstng of ( ) φ j u(x ), { u(cop) } cop cop(x j ;x ) The parent may then smply add these to ts pror parameters, by the frst property. Observed data nodes D may be treated as Delta dstrbutons whch send standard messages to ther parents and chldren. 1.9 Example: mean, precson and data usng VMP To demonstrate the power of the VMP formalsm, we consder the same scenaro as used to hand-derve the VB updates n Sect Usng VMP we can quckly substtute the partcular Gaussan and Gamma dstrbutons nto the VMP update equatons and quckly obtan all network messages for mean, precson, and data nodes as follows. (Messages to D are not requred n ths partcular example, but are useful n general for makng nferences about unobserved data.) The exponental forms used here are standard (Bernardo and Smth 2000) Mean node (wth known pror parameters) Begnnng wth the conjugate-exponental form of the Gaussan dstrbuton: [ ] [ ] mβ μ ln P(μ m,β)= β/2 μ 2 g(m,β) wth g(m,β)= 1 2 (ln β βm2 ln 2π) The message to chld D s the expectaton of suffcent statstcs: [ ] [ ] μ μ μ 2 = φ g(φ) φ = μ 2 + β Computng the log-lkelhood bound usng VMP VMP also makes computaton of the log-lkelhood bound smple. Recall that ln P(D M) L[Q(x)] P(x, D) L[Q(x)] = dxq(x) log Q(x) = log P(x, D) Q(x) log Q(x) Q(x)

13 A tutoral on varatonal Bayesan nference Wrtng as the sum of ndvdual node contrbutons from the unverse of all (data and hdden) nodes Ω = x D, = log P(ω pa(ω)) Q(ω,pa(ω)) log Q(ω) Q(ω) = ω Ω ω Ω L ω wth L ω = log P(ω pa(ω)) Q(ω,pa(ω)) log Q(ω) Q(ω) = φ π ω u(ω) + f (ω) + g(φπ ω ) Q(ω,pa(ω)) φ ω u(ω) + f (ω) + g(φ ω ) Q(ω) where φ π are the pror natural parameters condtoned only on the parents, and φ are the posteror parameters, after chld message are fused. Q(ω) uses the posteror parameters. By mult-lnearty we can push n the expectatons and smplfy to obtan: L ω = ( φ π ω Q(pa(ω)) + φ ω ) u(ω) Q(ω) + g(φ π ω ) Q(pa(ω)) + g(φ ω ) whch s smple to compute locally, from each node s receved messages. (The term φ π ω Q(pa(ω)) s computed by substtutng n the receved parent suffcent statstcs expectatons nto the conjugate-exponental formula for φ ω. By mult-lnearty, the expectaton can be pushed nto the suffcent statstcs.) The global model log-lkelhood bound s computed by summng the contrbutons from all nodes, both hdden and observed Summary In ths tutoral we have seen how varatonal methods may be used to approxmate jont posterors wth mean-feld dstrbutons. An long-hand calculaton of varatonal nference was shown, then the more general varatonal message passng framework ntroduced, whch greatly smplfes calculatons for conjugate-exponental networks. Varatonal methods are not approprate when the margnals or the correlatons structure of the jont are requred, but are useful n model comparson tasks when the jont s ntegrated out. References Attas H (2000) A varatonal Bayesan framework for graphcal models. In: Advances n neural nformaton processng systems. MIT Press Bernardo JM, Smth AFM (2000) Bayesan theory. Wley, London Bshop CM, Wnn JM, Spegelhalter D (2002) VIBES: a varatonal nference engne for Bayesan networks. In: Advances n neural nformaton processng systems Wnn J, Bshop C (2005) Varatonal message passng. J Mach Learn Res 6:

Expectation propagation

Expectation propagation Expectaton propagaton Lloyd Ellott May 17, 2011 Suppose p(x) s a pdf and we have a factorzaton p(x) = 1 Z n f (x). (1) =1 Expectaton propagaton s an nference algorthm desgned to approxmate the factors

More information

EM and Structure Learning

EM and Structure Learning EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder

More information

Section 8.3 Polar Form of Complex Numbers

Section 8.3 Polar Form of Complex Numbers 80 Chapter 8 Secton 8 Polar Form of Complex Numbers From prevous classes, you may have encountered magnary numbers the square roots of negatve numbers and, more generally, complex numbers whch are the

More information

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2) 1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons

More information

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Hidden Markov Models & The Multivariate Gaussian (10/26/04) CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models

More information

1 Motivation and Introduction

1 Motivation and Introduction Instructor: Dr. Volkan Cevher EXPECTATION PROPAGATION September 30, 2008 Rce Unversty STAT 63 / ELEC 633: Graphcal Models Scrbes: Ahmad Beram Andrew Waters Matthew Nokleby Index terms: Approxmate nference,

More information

Gaussian process classification: a message-passing viewpoint

Gaussian process classification: a message-passing viewpoint Gaussan process classfcaton: a message-passng vewpont Flpe Rodrgues fmpr@de.uc.pt November 014 Abstract The goal of ths short paper s to provde a message-passng vewpont of the Expectaton Propagaton EP

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline Outlne Bayesan Networks: Maxmum Lkelhood Estmaton and Tree Structure Learnng Huzhen Yu janey.yu@cs.helsnk.f Dept. Computer Scence, Unv. of Helsnk Probablstc Models, Sprng, 200 Notces: I corrected a number

More information

Conjugacy and the Exponential Family

Conjugacy and the Exponential Family CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Hidden Markov Models

Hidden Markov Models CM229S: Machne Learnng for Bonformatcs Lecture 12-05/05/2016 Hdden Markov Models Lecturer: Srram Sankararaman Scrbe: Akshay Dattatray Shnde Edted by: TBD 1 Introducton For a drected graph G we can wrte

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors Stat60: Bayesan Modelng and Inference Lecture Date: February, 00 Reference Prors Lecturer: Mchael I. Jordan Scrbe: Steven Troxler and Wayne Lee In ths lecture, we assume that θ R; n hgher-dmensons, reference

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ CSE 455/555 Sprng 2013 Homework 7: Parametrc Technques Jason J. Corso Computer Scence and Engneerng SUY at Buffalo jcorso@buffalo.edu Solutons by Yngbo Zhou Ths assgnment does not need to be submtted and

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

PHYS 705: Classical Mechanics. Calculus of Variations II

PHYS 705: Classical Mechanics. Calculus of Variations II 1 PHYS 705: Classcal Mechancs Calculus of Varatons II 2 Calculus of Varatons: Generalzaton (no constrant yet) Suppose now that F depends on several dependent varables : We need to fnd such that has a statonary

More information

Difference Equations

Difference Equations Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1

More information

THE SUMMATION NOTATION Ʃ

THE SUMMATION NOTATION Ʃ Sngle Subscrpt otaton THE SUMMATIO OTATIO Ʃ Most of the calculatons we perform n statstcs are repettve operatons on lsts of numbers. For example, we compute the sum of a set of numbers, or the sum of the

More information

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Linear Feature Engineering 11

Linear Feature Engineering 11 Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19

More information

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010 Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton

More information

Stat 543 Exam 2 Spring 2016

Stat 543 Exam 2 Spring 2016 Stat 543 Exam 2 Sprng 206 I have nether gven nor receved unauthorzed assstance on ths exam. Name Sgned Date Name Prnted Ths Exam conssts of questons. Do at least 0 of the parts of the man exam. I wll score

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also

More information

Stat 543 Exam 2 Spring 2016

Stat 543 Exam 2 Spring 2016 Stat 543 Exam 2 Sprng 2016 I have nether gven nor receved unauthorzed assstance on ths exam. Name Sgned Date Name Prnted Ths Exam conssts of 11 questons. Do at least 10 of the 11 parts of the man exam.

More information

Variational Bayesian Theory

Variational Bayesian Theory Chapter 2 Varatonal Bayesan Theory 2.1 Introducton Ths chapter covers the majorty of the theory for varatonal Bayesan learnng that wll be used n rest of ths thess. It s ntended to gve the reader a context

More information

Relevance Vector Machines Explained

Relevance Vector Machines Explained October 19, 2010 Relevance Vector Machnes Explaned Trstan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introducton Ths document has been wrtten n an attempt to make Tppng s [1] Relevance Vector Machnes

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.

More information

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF 10-708: Probablstc Graphcal Models 10-708, Sprng 2014 8 : Learnng n Fully Observed Markov Networks Lecturer: Erc P. Xng Scrbes: Meng Song, L Zhou 1 Why We Need to Learn Undrected Graphcal Models In the

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

Linear Regression Analysis: Terminology and Notation

Linear Regression Analysis: Terminology and Notation ECON 35* -- Secton : Basc Concepts of Regresson Analyss (Page ) Lnear Regresson Analyss: Termnology and Notaton Consder the generc verson of the smple (two-varable) lnear regresson model. It s represented

More information

Limited Dependent Variables

Limited Dependent Variables Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages

More information

Exercises of Chapter 2

Exercises of Chapter 2 Exercses of Chapter Chuang-Cheh Ln Department of Computer Scence and Informaton Engneerng, Natonal Chung Cheng Unversty, Mng-Hsung, Chay 61, Tawan. Exercse.6. Suppose that we ndependently roll two standard

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

ELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM

ELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM ELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM An elastc wave s a deformaton of the body that travels throughout the body n all drectons. We can examne the deformaton over a perod of tme by fxng our look

More information

Physics 5153 Classical Mechanics. D Alembert s Principle and The Lagrangian-1

Physics 5153 Classical Mechanics. D Alembert s Principle and The Lagrangian-1 P. Guterrez Physcs 5153 Classcal Mechancs D Alembert s Prncple and The Lagrangan 1 Introducton The prncple of vrtual work provdes a method of solvng problems of statc equlbrum wthout havng to consder the

More information

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before

More information

SIO 224. m(r) =(ρ(r),k s (r),µ(r))

SIO 224. m(r) =(ρ(r),k s (r),µ(r)) SIO 224 1. A bref look at resoluton analyss Here s some background for the Masters and Gubbns resoluton paper. Global Earth models are usually found teratvely by assumng a startng model and fndng small

More information

Thermodynamics and statistical mechanics in materials modelling II

Thermodynamics and statistical mechanics in materials modelling II Course MP3 Lecture 8/11/006 (JAE) Course MP3 Lecture 8/11/006 Thermodynamcs and statstcal mechancs n materals modellng II A bref résumé of the physcal concepts used n materals modellng Dr James Ellott.1

More information

C/CS/Phy191 Problem Set 3 Solutions Out: Oct 1, 2008., where ( 00. ), so the overall state of the system is ) ( ( ( ( 00 ± 11 ), Φ ± = 1

C/CS/Phy191 Problem Set 3 Solutions Out: Oct 1, 2008., where ( 00. ), so the overall state of the system is ) ( ( ( ( 00 ± 11 ), Φ ± = 1 C/CS/Phy9 Problem Set 3 Solutons Out: Oct, 8 Suppose you have two qubts n some arbtrary entangled state ψ You apply the teleportaton protocol to each of the qubts separately What s the resultng state obtaned

More information

The Geometry of Logit and Probit

The Geometry of Logit and Probit The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

Lecture 3 Stat102, Spring 2007

Lecture 3 Stat102, Spring 2007 Lecture 3 Stat0, Sprng 007 Chapter 3. 3.: Introducton to regresson analyss Lnear regresson as a descrptve technque The least-squares equatons Chapter 3.3 Samplng dstrbuton of b 0, b. Contnued n net lecture

More information

Lecture Note 3. Eshelby s Inclusion II

Lecture Note 3. Eshelby s Inclusion II ME340B Elastcty of Mcroscopc Structures Stanford Unversty Wnter 004 Lecture Note 3. Eshelby s Incluson II Chrs Wenberger and We Ca c All rghts reserved January 6, 004 Contents 1 Incluson energy n an nfnte

More information

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14 APPROXIMAE PRICES OF BASKE AND ASIAN OPIONS DUPON OLIVIER Prema 14 Contents Introducton 1 1. Framewor 1 1.1. Baset optons 1.. Asan optons. Computng the prce 3. Lower bound 3.1. Closed formula for the prce

More information

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009 College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:

More information

1 GSW Iterative Techniques for y = Ax

1 GSW Iterative Techniques for y = Ax 1 for y = A I m gong to cheat here. here are a lot of teratve technques that can be used to solve the general case of a set of smultaneous equatons (wrtten n the matr form as y = A), but ths chapter sn

More information

Lecture 4. Instructor: Haipeng Luo

Lecture 4. Instructor: Haipeng Luo Lecture 4 Instructor: Hapeng Luo In the followng lectures, we focus on the expert problem and study more adaptve algorthms. Although Hedge s proven to be worst-case optmal, one may wonder how well t would

More information

Artificial Intelligence Bayesian Networks

Artificial Intelligence Bayesian Networks Artfcal Intellgence Bayesan Networks Adapted from sldes by Tm Fnn and Mare desjardns. Some materal borrowed from Lse Getoor. 1 Outlne Bayesan networks Network structure Condtonal probablty tables Condtonal

More information

4DVAR, according to the name, is a four-dimensional variational method.

4DVAR, according to the name, is a four-dimensional variational method. 4D-Varatonal Data Assmlaton (4D-Var) 4DVAR, accordng to the name, s a four-dmensonal varatonal method. 4D-Var s actually a drect generalzaton of 3D-Var to handle observatons that are dstrbuted n tme. The

More information

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs

More information

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1 On an Extenson of Stochastc Approxmaton EM Algorthm for Incomplete Data Problems Vahd Tadayon Abstract: The Stochastc Approxmaton EM (SAEM algorthm, a varant stochastc approxmaton of EM, s a versatle tool

More information

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011 Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected

More information

Iterative General Dynamic Model for Serial-Link Manipulators

Iterative General Dynamic Model for Serial-Link Manipulators EEL6667: Knematcs, Dynamcs and Control of Robot Manpulators 1. Introducton Iteratve General Dynamc Model for Seral-Lnk Manpulators In ths set of notes, we are gong to develop a method for computng a general

More information

Hidden Markov Models

Hidden Markov Models Hdden Markov Models Namrata Vaswan, Iowa State Unversty Aprl 24, 204 Hdden Markov Model Defntons and Examples Defntons:. A hdden Markov model (HMM) refers to a set of hdden states X 0, X,..., X t,...,

More information

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering / Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons

More information

Expected Value and Variance

Expected Value and Variance MATH 38 Expected Value and Varance Dr. Neal, WKU We now shall dscuss how to fnd the average and standard devaton of a random varable X. Expected Value Defnton. The expected value (or average value, or

More information

The EM Algorithm (Dempster, Laird, Rubin 1977) The missing data or incomplete data setting: ODL(φ;Y ) = [Y;φ] = [Y X,φ][X φ] = X

The EM Algorithm (Dempster, Laird, Rubin 1977) The missing data or incomplete data setting: ODL(φ;Y ) = [Y;φ] = [Y X,φ][X φ] = X The EM Algorthm (Dempster, Lard, Rubn 1977 The mssng data or ncomplete data settng: An Observed Data Lkelhood (ODL that s a mxture or ntegral of Complete Data Lkelhoods (CDL. (1a ODL(;Y = [Y;] = [Y,][

More information

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1 Random varables Measure of central tendences and varablty (means and varances) Jont densty functons and ndependence Measures of assocaton (covarance and correlaton) Interestng result Condtonal dstrbutons

More information

A New Evolutionary Computation Based Approach for Learning Bayesian Network

A New Evolutionary Computation Based Approach for Learning Bayesian Network Avalable onlne at www.scencedrect.com Proceda Engneerng 15 (2011) 4026 4030 Advanced n Control Engneerng and Informaton Scence A New Evolutonary Computaton Based Approach for Learnng Bayesan Network Yungang

More information

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30 STATS 306B: Unsupervsed Learnng Sprng 2014 Lecture 10 Aprl 30 Lecturer: Lester Mackey Scrbe: Joey Arthur, Rakesh Achanta 10.1 Factor Analyss 10.1.1 Recap Recall the factor analyss (FA) model for lnear

More information

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva Econ 39 - Statstcal Propertes of the OLS estmator Sanjaya DeSlva September, 008 1 Overvew Recall that the true regresson model s Y = β 0 + β 1 X + u (1) Applyng the OLS method to a sample of data, we estmate

More information

Second order approximations for probability models

Second order approximations for probability models Second order approxmatons for probablty models lbert Kappen Department of Bophyscs Njmegen Unversty Njmegen, the Netherlands bertmbfysunnl Wm Wegernc Department of Bophyscs Njmegen Unversty Njmegen, the

More information

A Hybrid Variational Iteration Method for Blasius Equation

A Hybrid Variational Iteration Method for Blasius Equation Avalable at http://pvamu.edu/aam Appl. Appl. Math. ISSN: 1932-9466 Vol. 10, Issue 1 (June 2015), pp. 223-229 Applcatons and Appled Mathematcs: An Internatonal Journal (AAM) A Hybrd Varatonal Iteraton Method

More information

Finding Dense Subgraphs in G(n, 1/2)

Finding Dense Subgraphs in G(n, 1/2) Fndng Dense Subgraphs n Gn, 1/ Atsh Das Sarma 1, Amt Deshpande, and Rav Kannan 1 Georga Insttute of Technology,atsh@cc.gatech.edu Mcrosoft Research-Bangalore,amtdesh,annan@mcrosoft.com Abstract. Fndng

More information

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands Content. Inference on Regresson Parameters a. Fndng Mean, s.d and covarance amongst estmates.. Confdence Intervals and Workng Hotellng Bands 3. Cochran s Theorem 4. General Lnear Testng 5. Measures of

More information

Numerical Heat and Mass Transfer

Numerical Heat and Mass Transfer Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and

More information

Lecture 16 Statistical Analysis in Biomaterials Research (Part II)

Lecture 16 Statistical Analysis in Biomaterials Research (Part II) 3.051J/0.340J 1 Lecture 16 Statstcal Analyss n Bomaterals Research (Part II) C. F Dstrbuton Allows comparson of varablty of behavor between populatons usng test of hypothess: σ x = σ x amed for Brtsh statstcan

More information

Transfer Functions. Convenient representation of a linear, dynamic model. A transfer function (TF) relates one input and one output: ( ) system

Transfer Functions. Convenient representation of a linear, dynamic model. A transfer function (TF) relates one input and one output: ( ) system Transfer Functons Convenent representaton of a lnear, dynamc model. A transfer functon (TF) relates one nput and one output: x t X s y t system Y s The followng termnology s used: x y nput output forcng

More information

Module 9. Lecture 6. Duality in Assignment Problems

Module 9. Lecture 6. Duality in Assignment Problems Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept

More information

3.1 ML and Empirical Distribution

3.1 ML and Empirical Distribution 67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum

More information

Bayesian predictive Configural Frequency Analysis

Bayesian predictive Configural Frequency Analysis Psychologcal Test and Assessment Modelng, Volume 54, 2012 (3), 285-292 Bayesan predctve Confgural Frequency Analyss Eduardo Gutérrez-Peña 1 Abstract Confgural Frequency Analyss s a method for cell-wse

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014 COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #16 Scrbe: Yannan Wang Aprl 3, 014 1 Introducton The goal of our onlne learnng scenaro from last class s C comparng wth best expert and

More information

Lecture 3: Probability Distributions

Lecture 3: Probability Distributions Lecture 3: Probablty Dstrbutons Random Varables Let us begn by defnng a sample space as a set of outcomes from an experment. We denote ths by S. A random varable s a functon whch maps outcomes nto the

More information

Radar Trackers. Study Guide. All chapters, problems, examples and page numbers refer to Applied Optimal Estimation, A. Gelb, Ed.

Radar Trackers. Study Guide. All chapters, problems, examples and page numbers refer to Applied Optimal Estimation, A. Gelb, Ed. Radar rackers Study Gude All chapters, problems, examples and page numbers refer to Appled Optmal Estmaton, A. Gelb, Ed. Chapter Example.0- Problem Statement wo sensors Each has a sngle nose measurement

More information

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS Avalable onlne at http://sck.org J. Math. Comput. Sc. 3 (3), No., 6-3 ISSN: 97-537 COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

More information

Chapter 13: Multiple Regression

Chapter 13: Multiple Regression Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to

More information

Goodness of fit and Wilks theorem

Goodness of fit and Wilks theorem DRAFT 0.0 Glen Cowan 3 June, 2013 Goodness of ft and Wlks theorem Suppose we model data y wth a lkelhood L(µ) that depends on a set of N parameters µ = (µ 1,...,µ N ). Defne the statstc t µ ln L(µ) L(ˆµ),

More information

Global Sensitivity. Tuesday 20 th February, 2018

Global Sensitivity. Tuesday 20 th February, 2018 Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values

More information

8.6 The Complex Number System

8.6 The Complex Number System 8.6 The Complex Number System Earler n the chapter, we mentoned that we cannot have a negatve under a square root, snce the square of any postve or negatve number s always postve. In ths secton we want

More information

U-Pb Geochronology Practical: Background

U-Pb Geochronology Practical: Background U-Pb Geochronology Practcal: Background Basc Concepts: accuracy: measure of the dfference between an expermental measurement and the true value precson: measure of the reproducblty of the expermental result

More information

Economics 130. Lecture 4 Simple Linear Regression Continued

Economics 130. Lecture 4 Simple Linear Regression Continued Economcs 130 Lecture 4 Contnued Readngs for Week 4 Text, Chapter and 3. We contnue wth addressng our second ssue + add n how we evaluate these relatonshps: Where do we get data to do ths analyss? How do

More information

Chapter 12. Ordinary Differential Equation Boundary Value (BV) Problems

Chapter 12. Ordinary Differential Equation Boundary Value (BV) Problems Chapter. Ordnar Dfferental Equaton Boundar Value (BV) Problems In ths chapter we wll learn how to solve ODE boundar value problem. BV ODE s usuall gven wth x beng the ndependent space varable. p( x) q(

More information

1 Generating functions, continued

1 Generating functions, continued Generatng functons, contnued. Generatng functons and parttons We can make use of generatng functons to answer some questons a bt more restrctve than we ve done so far: Queston : Fnd a generatng functon

More information

Lagrange Multipliers. A Somewhat Silly Example. Monday, 25 September 2013

Lagrange Multipliers. A Somewhat Silly Example. Monday, 25 September 2013 Lagrange Multplers Monday, 5 September 013 Sometmes t s convenent to use redundant coordnates, and to effect the varaton of the acton consstent wth the constrants va the method of Lagrange undetermned

More information

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y) Secton 1.5 Correlaton In the prevous sectons, we looked at regresson and the value r was a measurement of how much of the varaton n y can be attrbuted to the lnear relatonshp between y and x. In ths secton,

More information

Differentiating Gaussian Processes

Differentiating Gaussian Processes Dfferentatng Gaussan Processes Andrew McHutchon Aprl 17, 013 1 Frst Order Dervatve of the Posteror Mean The posteror mean of a GP s gven by, f = x, X KX, X 1 y x, X α 1 Only the x, X term depends on the

More information

arxiv: v1 [stat.ml] 16 Jul 2015

arxiv: v1 [stat.ml] 16 Jul 2015 NIPS 2014 Workshop on Advances n Varatonal Inference Montreal, Canada On the Convergence of Stochastc Varatonal Inference n Bayesan Networks arxv:150704505v1 statml 16 Jul 2015 Ulrch Paquet Mcrosoft Research

More information

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics ECOOMICS 35*-A Md-Term Exam -- Fall Term 000 Page of 3 pages QUEE'S UIVERSITY AT KIGSTO Department of Economcs ECOOMICS 35* - Secton A Introductory Econometrcs Fall Term 000 MID-TERM EAM ASWERS MG Abbott

More information

Lecture 21: Numerical methods for pricing American type derivatives

Lecture 21: Numerical methods for pricing American type derivatives Lecture 21: Numercal methods for prcng Amercan type dervatves Xaoguang Wang STAT 598W Aprl 10th, 2014 (STAT 598W) Lecture 21 1 / 26 Outlne 1 Fnte Dfference Method Explct Method Penalty Method (STAT 598W)

More information

This column is a continuation of our previous column

This column is a continuation of our previous column Comparson of Goodness of Ft Statstcs for Lnear Regresson, Part II The authors contnue ther dscusson of the correlaton coeffcent n developng a calbraton for quanttatve analyss. Jerome Workman Jr. and Howard

More information

Chapter 11: Simple Linear Regression and Correlation

Chapter 11: Simple Linear Regression and Correlation Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests

More information

Why BP Works STAT 232B

Why BP Works STAT 232B Why BP Works STAT 232B Free Energes Helmholz & Gbbs Free Energes 1 Dstance between Probablstc Models - K-L dvergence b{ KL b{ p{ = b{ ln { } p{ Here, p{ s the eact ont prob. b{ s the appromaton, called

More information

Lecture 7: Boltzmann distribution & Thermodynamics of mixing

Lecture 7: Boltzmann distribution & Thermodynamics of mixing Prof. Tbbtt Lecture 7 etworks & Gels Lecture 7: Boltzmann dstrbuton & Thermodynamcs of mxng 1 Suggested readng Prof. Mark W. Tbbtt ETH Zürch 13 März 018 Molecular Drvng Forces Dll and Bromberg: Chapters

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information