arxiv: v2 [cs.lg] 16 Sep 2009

Size: px
Start display at page:

Download "arxiv: v2 [cs.lg] 16 Sep 2009"

Transcription

1 Mnmum Probablty Flow Learnng arxv: v2 [cs.lg] 6 Sep 29 Jascha Sohl-Dcksten ad, Peter Battaglno bd2 and Mchael R. DeWeese bcd3 a Bophyscs Graduate Group, b Department of Physcs, c Helen Wlls Neuroscence Insttute d Redwood Center for Theoretcal Neuroscence Unversty of Calforna, Berkeley, 9472 ascha@berkeley.edu, 2 pbb@berkeley.edu, 3 deweese@berkeley.edu, These authors contrbuted equally. Abstract Learnng n probablstc models s often hampered by the general ntractablty of the normalzaton factor and ts dervatves. Here we propose a new learnng technque that obvates the need to compute an ntractable normalzaton factor or sample from the equlbrum dstrbuton of the model. Ths s acheved by establshng dynamcs that would transform the observed data dstrbuton nto the model dstrbuton, and then settng as the obectve the mnmzaton of the ntal flow of probablty away from the data dstrbuton. Score matchng, mnmum velocty learnng, and certan forms of contrastve dvergence are shown to be specal cases of ths learnng technque. We demonstrate the applcaton of mnmum probablty flow learnng to parameter estmaton n Isng models, deep belef networks, multvarate Gaussan dstrbutons and a contnuous model wth a hghly general energy functon defned as a power seres. In the Isng model case, mnmum probablty flow learnng outperforms current state of the art technques by approxmately two orders of magntude n learnng tme, wth comparable error n recovered parameters. It s our hope that ths technque wll allevate exstng restrctons on the classes of probablstc models that are practcal for use. Introducton Estmatng parameters for probablstc models s a fundamental problem n many scentfc and engneerng dscplnes. Unfortunately, most probablstc learnng technques requre calculatng the normalzaton factor, or partton functon, of the probablstc model n queston, or at least calculatng ts gradent. For the overwhelmng maorty of models there are no known analytc solutons, confnng us to the hghly restrctve subset of probablstc models that can be analytcally solved, or those that can be made tractable usng known approxmate learnng technques. Thus, development of new technques for parameter estmaton n currently ntractable probablstc models has the potental to be of great beneft, lftng near ubqutous restrctons on how we are able to model the world. Many approaches exst for approxmate learnng, ncludng mean feld theory and ts expansons, varatonal Bayes technques and a plethora of samplng or numercal ntegraton based methods [22,, 9, 5]. Of partcular nterest are contrastve dvergence (CD), developed by Wellng, Hnton and Carrera-Perpñán [23, 4], Hyvärnen s score matchng (SM) [7], and the mnmum velocty learnng framework proposed by Movellan [4, 3, 5]. Contrastve dvergence [23, 4] s a varaton on steepest gradent descent of the maxmum (log) lkelhood (ML) obectve functon. Rather than ntegratng over the full model model dstrbuton, CD approxmates the partton functon term n the gradent by averagng over the dstrbuton real-

2 zed after takng a few Markov chan Monte Carlo (MCMC) steps away from the data dstrbuton. Qualtatvely, one can magne that the data dstrbuton s contrasted aganst a dstrbuton whch has evolved a small dstance towards the model dstrbuton, whereas t would usually be contrasted aganst the true model dstrbuton. Although CD s not guaranteed to converge to the rght answer, or even to a fxed pont, t has proven to be an effectve and fast heurstc for parameter estmaton [, 24]. Score matchng, developed by Aapo Hyvärnen [7], s a method that learns parameters n a probablstc model usng only dervatves of the energy functon evaluated over the data dstrbuton (see Equaton (2)). Ths sdesteps the need to explctly sample or ntegrate over the model dstrbuton. In score matchng one mnmzes the expected square dstance of the score functon wth respect to spatal coordnates gven by the data dstrbuton from the smlar score functon gven by the model dstrbuton. It can be seen as an ntegraton of the contrastve dvergence gradent for nfntesmal Langevn dynamcs [8], as the lmt of approxmatng the model dstrbuton by patchng together cutouts of the model dstrbuton around each data pont [2], and fnally as equvalent to mnmum velocty learnng [4]. Mnmum velocty learnng s an approach recently proposed by Movellan [4] that recasts a number of the deas behnd CD, treatng the mnmzaton of the ntal dynamcs away from the data dstrbuton as the goal tself rather than a surrogate for t. Movellan s proposal s that rather than drectly mnmze the dfference between the data and the model, one ntroduces system dynamcs that have the model as ther equlbrum dstrbuton, and mnmzes the ntal flow of probablty away from the data under those dynamcs. If the model looks exactly lke the data there wll be no flow of probablty, and f model and data are smlar the flow of probablty wll tend to be mnmal. Movellan apples ths ntuton to the specfc case of dstrbutons over contnuous state spaces evolvng va dffuson dynamcs. The velocty n mnmum velocty learnng s the dfference n average drft veloctes between partcles dffusng under the model dstrbuton and partcles dffusng under the data dstrbuton. Here we provde a framework, applcable to any parametrc model, of whch mnmum velocty, certan forms of CD, and SM are all specal cases, and whch s n many stuatons more powerful than any of these algorthms. Ths framework extends the deas behnd mnmum velocty learnng to arbtrary state spaces and a far broader class of dynamcs. We show that learnng under ths framework s effectve and fast n a number of cases: Isng models, deep belef networks (DBN), multdmensonal Gaussan dstrbutons, and a complcated two-dmensonal contnuous dstrbuton. 2 Mnmum probablty flow Our goal s to fnd the parameters that cause a probablstc model to best agree wth a set of (assumed d) observatons of the state of a system. We wll do ths by proposng dynamcs that guarantee the transformaton of the data dstrbuton nto the model dstrbuton, and then mnmzng the magntude of the ntal flow of probablty away from the data dstrbuton. 2. Dstrbutons The data dstrbuton s represented by a vector, wth the probablty of observng the system n a state. The superscrpt () represents tme t = under the system dynamcs, as wll The update rule for gradent descent of the negatve log lkelhood, or maxmum lkelhood obectve functon, s θ h P p() log p ( ) (θ) θ = X E (θ) θ + X E (θ) θ p ( ) (θ), where and p ( ) (θ) represent the data dstrbuton and model dstrbuton, respectvely, E (θ) s the energy functon assocated wth the model dstrbuton and ndexes the states of the system (see Secton 2.). The second term n ths gradent can be extremely dffcult to compute (costng n general an amount of tme exponental n the dmensonalty of the system). Under contrastve dvergence p ( ) (θ) s replaced by samples only a few Monte Carlo steps away from the data. 2

3 data dstrbuton dynamcs model dstrbuton ṗ () = = data p ( ) Γ (θ) ṗ (t) = Γ (θ) (θ) (θ) = e E(θ) Z (θ) ṗ ( ) = Fgure : Dynamcs of mnmum probablty flow learnng. Model dynamcs represented by the probablty flow matrx Γ (mddle) determne how probablty flows from the emprcal hstogram of the sample data ponts (left) to the equlbrum dstrbuton of the model (rght) after a suffcently long tme. In ths example there are only four possble states for the system, whch conssts of a par of bnary varables, and the partcular model parameters favor state whereas the data falls mostly on other states. be descrbed n more detal n Secton 2.2. If the observatons were of a two varable bnary system, then would have four entres representng the probabltes of observng states,, and. Our goal s to fnd the parameters θ that cause a model dstrbuton p ( ) (θ) to best match the data dstrbuton. Wthout loss of generalty, we assume the model dstrbuton to be of the form p ( ) (θ) = exp ( E (θ)), () Z (θ) where E (θ) s referred to as the energy functon, and the normalzng factor Z (θ) s called the partton functon, Z (θ) = exp ( E (θ)) (2) (here we have set the temperature of the system to ). The superscrpt ( ) ndcates that ths s the equlbrum dstrbuton reached after runnng the dynamcs for nfnte tme. 2.2 Dynamcs We wsh to generalze Movellan s dffuson dynamcs to arbtrary state spaces. To accomplsh ths, we observe that dffuson dynamcs are a specal case of dynamcs governed by a master equaton that enforces conservaton of probablty [6]: ṗ (t) = Γ (θ) Γ (θ), (3) where ṗ (t) = p(t) t s the rate of change of probablty of state wth tme. Transton rates Γ (θ) gve the rate at whch probablty wll flow from a state nto a state. The frst term of Equaton (3) represents flow of probablty out of other states nto the state, and the second represents flow out of nto other states. The dependence on θ results from the requrement that the dynamcs we choose cause to flow to the equlbrum dstrbuton p ( ) (θ). For readablty, explct dependence on θ wll be dropped except where specfcally relevant. If we choose the dagonal of Γ to obey Γ = Γ, then we can wrte the dynamcs as ṗ (t) = Γ (4) 3

4 (see Fgure ). The unque soluton for s Detaled Balance = exp (Γt). (5) Γ must be chosen such that the dynamcs n Equaton (4) converge to the model dstrbuton. One way to guarantee ths s by choosng Γ such that t satsfes detaled balance for the model dstrbuton p ( ), and such that there s a path through Γ allowng mxng between any two states. Note that there s no need to restrct the dynamcs defned by Γ to those of any real physcal process, such as dffuson. Detaled balance requres that at equlbrum the probablty flow from state nto state equals the probablty flow from nto, whch can be rewrtten as Γ Γ p ( ) (θ) = Γ p ( ) (θ), (6) = p( ) (θ) Γ p ( ) (θ) = exp [E (θ) E (θ)]. (7) Γ s underconstraned by the above equaton. Motvated by symmetry and aesthetcs, we choose as the form for the (non-zero, non-dagonal) entres n Γ [ ] Γ = exp 2 (E (θ) E (θ)) ( ). (8) The choce Γ = Γ = also satsfes Equaton (6), allowng a sparse populaton of Γ for purposes of computatonal tractablty. Theoretcally, to guarantee convergence to the model dstrbuton, the non-zero elements of Γ must be chosen such that, gven suffcent tme, probablty can flow between any par of states. In practce, we wll only need to consder a small fracton of the non-zero elements n Γ (see Secton 2.5). 2.4 Obectve Functon The goal s to mnmze the ntal flow of probablty away from the data dstrbuton (Fgure 2). Although other obectve functons are possble for a mnmum probablty flow approach, we have found the L norm to be partcularly effectve: ˆθ = arg mn K (θ), (9) θ K (θ) = ṗ () (θ) = ṗ () (θ). () Ths obectve functon s unquely zero when and p ( ) (θ) are exactly equal (although n general the relatonshp of ˆθ to the maxmum lkelhood soluton s less clear). Some algebra gves the learnng gradent wth respect to θ: K θ = 2, Γ (θ) [ E (θ) θ E ] (θ) [ ( sgn θ ṗ () (θ) ) ( sgn ṗ () )] (θ). () Note that Equatons (9) through () do not depend on the partton functon Z (θ) or ts dervatves. Under the constrant that Γ does not allow probablty to flow drectly from one state wth data to another - nearly always satsfed when the number of system states s much larger than the number of states wth data - Equaton (9) s equvalent to mnmzng the ntal rate of growth of the KL dvergence between and, D KL( ) t t= (see Appendx A). Under the same constrant, the mnmum probablty flow obectve functon K (θ) s convex for all models p ( ) (θ) n the exponental famly - that s, models who s energy functon E (θ) s lnear n ther parameters θ [2] (see Appendx B). 2 The form chosen for Γ n Equaton (4), coupled wth the satsfacton of detaled balance and ergodcty ntroduced n secton 2.3, guarantees that there s a unque egenvector p ( ) of Γ wth egenvalue zero, and that all other egenvalues of Γ have negatve real parts. 4

5 a b p ( ) (θ) c p () (θ) d. (θ) States of the system Fgure 2: An llustraton of the mnmum probablty flow obectve functon, whch mnmzes the ntal flow of probablty away from the data. a. Emprcal hstogram of the observed data over all possble states of the system. b. Model dstrbuton that the dynamcs would converge to f allowed to run for a suffcently long tme. The dynamcs and model dstrbuton are both functons of the model parameters (θ). Our goal s to make the model, or equlbrum, dstrbuton as much lke the data dstrbuton as possble. c. The dstrbuton after startng at the data and runnng the dynamcs for a short tme perod. d. The temporal dervatve of the probablty dstrbuton, or probablty flow, at t =. Learnng s acheved by changng the model parameters so as to mnmze the shaded regon of ths graph. 2.5 Tractablty The vector s typcally huge, as s Γ (e.g., 2 N and 2 N 2 N, respectvely, for an N-bt bnary system). Naïvely, ths would seem to prohbt evaluaton and mnmzaton of the obectve functon. Fortunately, all the elements n not correspondng to observatons are zero. Snce our obectve functon s only evaluated at tme t = ths allows us to gnore all those Γ for whch no data pont exsts at state. Addtonally, there s a great deal of flexblty as far as whch elements of Γ can be set to zero. By populatng Γ so as to connect each state to a small fxed number of addtonal states, the cost of the algorthm n both memory and tme s O(M), where M s the number of observed data ponts, and does not depend on the number of system states. 2.6 Contnuous Systems Although we have motvated ths technque usng systems wth a large, but fnte, number of states, t generalzes n a straghtforward manner to contnuous systems. The flow matrx Γ and dstrbuton vectors transton from beng very large to beng nfnte n sze. Γ can stll be chosen to connect each state to a small, fnte, number of addtonal states however, and only outgong probablty flow from states wth data contrbutes to the obectve functon, so the cost of learnng remans largely unchanged. In addton, for a partcular pattern of connectvty n Γ ths obectve functon, lke Movellan s [4], reduces to score matchng [7] (other connectvty patterns reduce to alternate forms). Takng the lmt of connectons between all states wthn a small dstance ɛ of each other, and then Taylor expandng n ɛ, one can show that, up to an overall constant and scalng factor K K SM = {samples} [ ] 2 E(x ) E(x ) 2 E(x ). (2) 5

6 Mean absolute correlaton error unt Isng model Tme (sec) Mean absolute correlaton error unt Isng model 5 5 Tme (sec) Fgure 3: A demonstraton of rapd fttng of the Isng model by mnmum probablty flow learnng. The mean absolute error n the learned model s correlaton matrx s shown as a functons of learnng tme for 4 and unt fully connected Isng models. Convergence s reached n about 5 seconds for 2, samples from the 4 unt model (left) and n about mnute for, samples from the unt model (rght). Detals of the unt model can be seen n Fgure 4. J J new J J new C C new C C new Fgure 4: An example unt Isng model ft usng mnmum probablty flow learnng. (left) Randomly chosen Gaussan couplng matrx J (top) wth varance.4 and assocated correlaton matrx C (bottom) for a unt, fully-connected Isng model. The dagonal has been removed from the correlaton matrx C for ncreased vsblty. (center) The recovered couplng and correlaton matrces after mnmum probablty flow learnng on, samples from the model n the left panels. (rght) The error n recovery of the couplng and correlaton matrces. Ths reproduces the lnk dscovered by Movellan [4] between dffuson dynamcs over contnuous spaces and score matchng. 3 Expermental Results Matlab code mplementng mnmum probablty flow learnng for each of the followng cases s avalable upon request. A publc toolkt s under constructon. All mnmzaton was performed usng Mark Schmdt s remarkably effectve mnfunc [7]. 3. Isng model The Isng model has a long and stored hstory n physcs [3] and machne learnng [] and t has recently been found to be a surprsngly useful model for networks of neurons n the retna [8, 2]. The ablty to ft Isng models to the actvty of large groups of smultaneously recorded neurons s 6

7 2 unts 2 unts 2 unts 2 unts 28x28 pxels Fgure 5: A deep belef network traned usng mnmum probablty flow learnng and contrastve dvergence. (left) A four layer deep belef network was traned on the MNIST postal hand wrtten dgts dataset. (center) Confabulatons after tranng va mnmum probablty flow learnng. A reasonable probablstc model for handwrtten dgts has been learned. (rght) Confabulatons after tranng va sngle step CD. Note the uneven dstrbuton of dgt occurrences. of current nterest gven the ncreasng number of these types of data sets from the retna, cortex and other bran structures. We ft an Isng model (fully vsble Boltzmann machne) of the form p ( ) (x; J) = Z(J) exp J x x (3), to a set of N d-element d data samples { x () =...N } generated va Gbbs samplng from an Isng model as descrbed below, where each of the d elements of x s ether or. Because each x {, }, x 2 = x, we can wrte the energy functon as E(x; J) = J x. (4), J x x + The probablty flow matrx Γ has 2 N 2 N elements, but we allow only elements correspondng to transtons nto states a sngle bt-flp away to be non-zero. Fgure 3 shows the average error n predcted correlatons as a functon of learnng tme for 2, samples from a 4 unt, fully connected Isng model. The J used were gracously provded by Broderck and coauthors, and were dentcal to those used for synthetc data generaton n the 28 paper Faster solutons of the nverse parwse Isng problem [2]. Tranng was performed on 2, samples so as to match the number of samples used n secton III.A. of Broderck et al. Note that gven suffcent samples, the mnmum probablty flow algorthm would converge exactly to the rght answer, as learnng n the Isng model s convex (Appendx B), and has ts global mnmum at the true soluton. On an 8 core 2.33 GHz Intel Xeon, the learnng converges n about 5 seconds. Broderck et al. perform a smlar learnng task on a -CPU grd computng cluster, wth a convergence tme of approxmately 2 seconds. Smlar learnng was performed for, samples from a unt, fully connected, Isng model. A couplng matrx was chosen wth elements randomly drawn from a Gaussan wth mean and varance.4. Usng the mnmum probablty flow learnng technque, learnng took approxmately mnute, compared to roughly 2 hours for a unt (nearest neghbor couplng only) model of retnal data [9] (personal communcaton, J. Shlens). Fgure 4 demonstrates the recovery of the couplng and correlaton matrces for our fully connected Isng model, whle Fgure 3 shows the tme course for learnng. 3.2 Deep Belef Network As a demonstraton of learnng on a more complex dscrete valued model, we traned a 4 layer deep belef network (DBN) [6] on MNIST handwrtten dgts. A DBN conssts of stacked restrcted Boltzmann machnes (RBMs), such that the hdden layer of one RBM forms the vsble layer of the 7

8 Fgure 6: A contnuous state space model ft usng mnmum probablty flow learnng. (left) Randomly chosen couplng matrx Σ and assocated covarance matrx Σ for a dmensonal Gaussan dstrbuton. (center) The recovered couplng matrx Σ new and assocated covarance matrx Σ new after mnmum probablty flow learnng on, samples from the model n (left). (rght) The error n recovery of the couplng and covarance matrces. next. Each RBM has the form: p ( ) (x vs, x hd ; W) = p ( ) (x vs ; W) = Z(W) exp W x vs, x hd,, (5), Z(W) exp ( [ log + exp ]) W x vs,. (6) Note that samplng-free applcaton of the mnmum probablty flow algorthm requres analytcally margnalzng over the hdden unts. RBMs were traned n sequence, startng at the bottom layer, on, samples from the MNIST postal hand wrtten dgts data set. As n the Isng case, the probablty flow matrx Γ was populated so as to connect every state to all states whch dffered by only a sngle bt flp. Tranng was performed by both mnmum probablty flow and sngle step CD to allow a smple comparson of the two technques (note that CD turns nto full ML learnng as the number of steps s ncreased, and that the qualty of the CD answer can thus be mproved at the cost of computatonal tme by usng many-step CD). Confabulatons were performed by Gbbs samplng from the top layer RBM, then propagatng each sample back down to the pxel layer by way of the condtonal dstrbuton p ( ) (x vs x hd ; W k ) for each of the ntermedary RBMs, where k ndexes the layer n the stack. As shown n Fgure 5, mnmum probablty flow learned a good model of handwrtten dgts. 3.3 Gaussan As an example of mnmum probablty flow learnng appled to contnuous models, we ft a multvarate Gaussan dstrbuton to synthetc data. The model dstrbuton has the form p ( ) (x; Σ ) = [ Z (Σ ) exp ] 2 xt Σ x, (7) wth vector x and couplng matrx Σ. We ft to, d samples from a -dmensonal Gaussan dstrbuton. The probablty flow matrx Γ was populated so as to connect every state to 2 addtonal states, chosen from a Gaussan dstrbuton wth varance. centered on the state. Results are shown n Fgure 6. 8

9 Fgure 7: A hghly unconstraned, dffcult to normalze, model ft usng mnmum probablty flow learnng. (left) Hstogram of a complcated two-dmensonal contnuous dstrbuton (x, y), (x, y) [, ] 2. The probablty of observng a sample (x, y) s proportonal to the pxel value at locaton (x, y) n the hstogram. Note, the mage represents a dstrbuton over (x, y) values, not a sample from a dstrbuton. (center) Scatter plot of, samples drawn from the dstrbuton n (left) (rght) Hstogram of learned dstrbuton p ( ) (x, y; θ) traned n batch( mode on groups of, samples from the dstrbuton n (left), where p ( ) (x, y; θ) exp 28 ) 28 m= n= θ mnl m (x) L n (y) and L m (x) s the mth Legendre polynomal n x. 3.4 Power Seres Energy Functon To demonstrate mnmum probablty flow s effectveness n an extremely flexble, dffcult to normalze, model, we learned parameters θ for a two-dmensonal contnuous dstrbuton of the form [ ] p ( ) (x, y; θ) = M Z (θ) exp θ mn L m (x)l n (y), (8) m,n= where L m (x) s the mth order Legendre polynomal n x, (x, y) [, ] 2, M s the maxmum polynomal order, and Z (θ) s the normalzaton factor. We ft an M = 28 dstrbuton usng consecutve lne searches on batches of, d samples from the dstrbuton shown on the left of Fgure 7. The probablty flow matrx Γ was populated so as to connect every state wth 2 other states, chosen from a unform dstrbuton n the range [, ] 2. Fgure 7 shows a hstogram of the data dstrbuton (x, y; θ) compared to a hstogram of the learned Legendre functon expanson p ( ) (x, y; θ). 4 Summary We have presented a novel framework for effcent learnng n the context of any parametrc model. Ths method was nspred by the mnmum velocty approach developed by Movellan, and t reduces to that technque as well as to score matchng and some forms of contrastve dvergence under sutable choces for the dynamcs and state space. By decouplng the dynamcs from any specfc physcal process, such as dffuson, and focusng on the ntal flow of probablty from the data to a subset of other states chosen n part for ther utlty and convenence, we have arrved at a framework that s not only more general than prevous approaches, but also potentally much more powerful. We expect that ths framework wll render some prevously ntractble models more amenable to estmaton. Acknowledgments We would lke to thank Javer Movellan for sharng a work n progress; Tamara Broderck, Mroslav Dudík, Gašper Tkačk, Robert E. Schapre and Wllam Balek for use of ther Isng model couplng parameters; Jonathon Shlens for useful dscusson and ground truth for hs Isng model convergence tmes; Bruno Olshausen, Anthony Bell, Chrstopher Hllar, Charles Cadeu, Klan Koepsell and the 9

10 rest of the Redwood Center for many useful dscussons and for comments on earler versons of the manuscrpt; Ashvn Vshwanath for useful dscusson; and the Canadan Insttute for Advanced Research - Neural Computaton and Percepton Program for ther fnancal support (JSD). APPENDICES A Connecton to KL Dvergence We want to measure the rate of growth of the KL dvergence at tme, D KL t t=. The KL dvergence between the data dstrbuton, and the dstrbuton resultng after runnng the dynamcs for a tme t, s D KL = log log. (A-) Note that the terms for whch = wll never contrbute to ths sum. To make ths explct, we rewrte the sum as beng over the set of states whch are non-zero n. Ths set s { } D = :. (A-2) We also note the complement of ths set, Ths makes the KL dvergence D KL = D D C = { } : =. (A-3) log log. (A-4) D The dervatve s D KL t = D = D = D = D D D δ D δ t δ [ Γ [ Γ Γ Γ ] Γ ] Γ + D D () p D C () p D C Γ Γ. (A-5) (A-6) (A-7) (A-8) In the last lne the sum over all has been broken nto a sum over D and ts complement D C. We evaluate the dervatve at t = D KL t= = δ Γ + Γ (A-9) t D D D D C δ Γ Γ. D D D D C We can smplfy ths by notng that the followng terms are : δ Γ δ Γ = (A-) D D D D Γ = (A-) D D C

11 Ths means that the rate of growth of the KL dvergence at the data dstrbuton, t =, s D KL = Γ t=. (A-2) t D D C That s, the rate of growth of the KL dvergence s equal to the rate of probablty flow from states wth data to those wthout. Ths s equvalent to the mnmum probablty flow L obectve functon n the usual case that Γ does not allow probablty to flow drectly from one state wth data to another. B Convexty As observed by Macke and Gerwnn [2], Equaton (A-2) s convex for models n the exponental famly. We wsh to mnmze K has dervatve and Hessan K = D K = θ m θ m D D c = 2 2 K = θ m θ n D C Γ. (B-) ( Γ Γ D D c Γ D D c Γ D D c ) ( E θ m E θ m ( E E θ m θ m ( 2 E θ m θ n (B-2) ), (B-3) ) ( E E θ n θ n 2 E θ m θ n ) (B-4) ). (B-5) The frst term s a weghted sum of outer products, wth non-negatve weghts 4 Γ, and s thus postve semdefnte. The second term s for models n the exponental famly (those wth energy functons lnear n ther parameters). Parameter estmaton for models n the exponental famly s therefore convex usng mnmum probablty flow learnng, n the commonly satsfed lmt that Γ does not drectly connect any two data ponts. References [] D H Ackley, G E Hnton, and T J Senowsk. A learnng algorthm for Boltzmann machnes. Cogntve Scence, 9(2):47 69, 985. [2] T Broderck, M Dudík, G Tkačk, R Schapre, and W Balek. Faster solutons of the nverse parwse Isng problem. E-prnt arxv, Jan 27. [3] S G Brush. Hstory of the Lenz-Isng model. Revews of Modern Physcs, 39(4): , Oct 967. [4] M A Carrera-Perpñán and G E Hnton. On contrastve dvergence (CD) learnng. Techncal report, Dept. of Computere Scence, Unversty of Toronto, 24. [5] S Haykn. Neural networks and learnng machnes; 3rd edton. Prentce Hall, 28. [6] Geoffrey E Hnton, Smon Osndero, and Yee-Whye Teh. A fast learnng algorthm for deep belef nets. Neural Computaton, 8(7): , Jul 26. [7] A Hyvärnen. Estmaton of non-normalzed statstcal models usng score matchng. Journal of Machne Learnng Research, 6:695 79, 25.

12 [8] A Hyvärnen. Connectons between score matchng, contrastve dvergence, and pseudolkelhood for contnuous-valued varables. IEEE Transactons on Neural Networks, Jan 27. [9] T Jaakkola and M Jordan. A varatonal approach to Bayesan logstc regresson models and ther extensons. Proceedngs of the Sxth Internatonal Workshop on Artfcal Intellgence and Statstcs, Jan 997. [] H Kappen and F Rodríguez. Mean feld approach to learnng n Boltzmann machnes. Pattern Recognton Letters, Jan 997. [] D MacKay. Falures of the one-step learnng algorthm. Jan 2. [2] J Macke and S Gerwnn. Personal communcaton. 29. [3] J R Movellan. Contrastve dvergence n Gaussan dffusons. Neural Computaton, 2(9): , 28. [4] J R Movellan. A mnmum velocty approach to learnng. unpublshed draft, Jan 28. [5] J R Movellan and J L McClelland. Learnng contnuous probablty dstrbutons wth symmetrc dffuson networks. Cogntve Scence, 7: , 993. [6] R Pathra. Statstcal Mechancs. Butterworth Henemann, Jan 972. [7] M Schmdt. mnfunc. schmdtm/software/mnfunc.html, 25. [8] E Schnedman, M J Berry 2nd, R Segev, and W Balek. Weak parwse correlatons mply strongly correlated network states n a neural populaton. Nature, 44(787):7 2, 26. [9] J Shlens, G D Feld, J L Gauther, M Greschner, A Sher, A M Ltke, and E J Chchlnsky. The structure of large-scale synchronzed frng n prmate retna. Journal of Neuroscence, 29(5):522 53, Apr 29. [2] J Shlens, G D Feld, J L Gauther, M I Grvch, D Petrusca, A Sher, A M Ltke, and E J Chchlnsky. The structure of mult-neuron frng patterns n prmate retna. J. Neurosc., 26(32): , 26. [2] J Sohl-Dcksten and B Olshausen. A spatal dervaton of score matchng. Redwood Center Techncal Report, 29. [22] T Tanaka. Mean-feld theory of Boltzmann machne learnng. Physcal Revew Letters E, Jan 998. [23] M Wellng and G Hnton. A new learnng algorthm for mean feld Boltzmann machnes. Lecture Notes n Computer Scence, Jan 22. [24] A Yulle. The convergence of contrastve dvergences. Department of Statstcs, UCLA. Department of Statstcs Papers., 25. 2

Minimum Probability Flow Learning

Minimum Probability Flow Learning Mnmum Probablty Flow Learnng Jascha Sohl-Dcksten ab jascha@berkeley.edu Peter Battaglno ac pbb@berkeley.edu Mchael R. DeWeese acd deweese@berkeley.edu a Redwood Center for Theoretcal Neuroscence, b Bophyscs

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing CSC321 Tutoral 9: Revew of Boltzmann machnes and smulated annealng (Sldes based on Lecture 16-18 and selected readngs) Yue L Emal: yuel@cs.toronto.edu Wed 11-12 March 19 Fr 10-11 March 21 Outlne Boltzmann

More information

Conjugacy and the Exponential Family

Conjugacy and the Exponential Family CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the

More information

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Hidden Markov Models & The Multivariate Gaussian (10/26/04) CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:

More information

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

Singular Value Decomposition: Theory and Applications

Singular Value Decomposition: Theory and Applications Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real

More information

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010 Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton

More information

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF 10-708: Probablstc Graphcal Models 10-708, Sprng 2014 8 : Learnng n Fully Observed Markov Networks Lecturer: Erc P. Xng Scrbes: Meng Song, L Zhou 1 Why We Need to Learn Undrected Graphcal Models In the

More information

Natural Images, Gaussian Mixtures and Dead Leaves Supplementary Material

Natural Images, Gaussian Mixtures and Dead Leaves Supplementary Material Natural Images, Gaussan Mxtures and Dead Leaves Supplementary Materal Danel Zoran Interdscplnary Center for Neural Computaton Hebrew Unversty of Jerusalem Israel http://www.cs.huj.ac.l/ danez Yar Wess

More information

Lecture 12: Classification

Lecture 12: Classification Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna

More information

The Geometry of Logit and Probit

The Geometry of Logit and Probit The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

Resource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud

Resource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud Resource Allocaton wth a Budget Constrant for Computng Independent Tasks n the Cloud Wemng Sh and Bo Hong School of Electrcal and Computer Engneerng Georga Insttute of Technology, USA 2nd IEEE Internatonal

More information

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve

More information

Composite Hypotheses testing

Composite Hypotheses testing Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017 U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

Convergence of random processes

Convergence of random processes DS-GA 12 Lecture notes 6 Fall 216 Convergence of random processes 1 Introducton In these notes we study convergence of dscrete random processes. Ths allows to characterze phenomena such as the law of large

More information

Numerical Heat and Mass Transfer

Numerical Heat and Mass Transfer Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

Hopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen

Hopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen Hopfeld networks and Boltzmann machnes Geoffrey Hnton et al. Presented by Tambet Matsen 18.11.2014 Hopfeld network Bnary unts Symmetrcal connectons http://www.nnwj.de/hopfeld-net.html Energy functon The

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

The equation of motion of a dynamical system is given by a set of differential equations. That is (1)

The equation of motion of a dynamical system is given by a set of differential equations. That is (1) Dynamcal Systems Many engneerng and natural systems are dynamcal systems. For example a pendulum s a dynamcal system. State l The state of the dynamcal system specfes t condtons. For a pendulum n the absence

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

Physics 5153 Classical Mechanics. D Alembert s Principle and The Lagrangian-1

Physics 5153 Classical Mechanics. D Alembert s Principle and The Lagrangian-1 P. Guterrez Physcs 5153 Classcal Mechancs D Alembert s Prncple and The Lagrangan 1 Introducton The prncple of vrtual work provdes a method of solvng problems of statc equlbrum wthout havng to consder the

More information

Appendix B. The Finite Difference Scheme

Appendix B. The Finite Difference Scheme 140 APPENDIXES Appendx B. The Fnte Dfference Scheme In ths appendx we present numercal technques whch are used to approxmate solutons of system 3.1 3.3. A comprehensve treatment of theoretcal and mplementaton

More information

Using deep belief network modelling to characterize differences in brain morphometry in schizophrenia

Using deep belief network modelling to characterize differences in brain morphometry in schizophrenia Usng deep belef network modellng to characterze dfferences n bran morphometry n schzophrena Walter H. L. Pnaya * a ; Ary Gadelha b ; Orla M. Doyle c ; Crstano Noto b ; André Zugman d ; Qurno Cordero b,

More information

Module 9. Lecture 6. Duality in Assignment Problems

Module 9. Lecture 6. Duality in Assignment Problems Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept

More information

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Tracking with Kalman Filter

Tracking with Kalman Filter Trackng wth Kalman Flter Scott T. Acton Vrgna Image and Vdeo Analyss (VIVA), Charles L. Brown Department of Electrcal and Computer Engneerng Department of Bomedcal Engneerng Unversty of Vrgna, Charlottesvlle,

More information

Gaussian Mixture Models

Gaussian Mixture Models Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous

More information

Markov Chain Monte Carlo Lecture 6

Markov Chain Monte Carlo Lecture 6 where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways

More information

Information Geometry of Gibbs Sampler

Information Geometry of Gibbs Sampler Informaton Geometry of Gbbs Sampler Kazuya Takabatake Neuroscence Research Insttute AIST Central 2, Umezono 1-1-1, Tsukuba JAPAN 305-8568 k.takabatake@ast.go.jp Abstract: - Ths paper shows some nformaton

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

Chapter 9: Statistical Inference and the Relationship between Two Variables

Chapter 9: Statistical Inference and the Relationship between Two Variables Chapter 9: Statstcal Inference and the Relatonshp between Two Varables Key Words The Regresson Model The Sample Regresson Equaton The Pearson Correlaton Coeffcent Learnng Outcomes After studyng ths chapter,

More information

4DVAR, according to the name, is a four-dimensional variational method.

4DVAR, according to the name, is a four-dimensional variational method. 4D-Varatonal Data Assmlaton (4D-Var) 4DVAR, accordng to the name, s a four-dmensonal varatonal method. 4D-Var s actually a drect generalzaton of 3D-Var to handle observatons that are dstrbuted n tme. The

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

Appendix B: Resampling Algorithms

Appendix B: Resampling Algorithms 407 Appendx B: Resamplng Algorthms A common problem of all partcle flters s the degeneracy of weghts, whch conssts of the unbounded ncrease of the varance of the mportance weghts ω [ ] of the partcles

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have

More information

VQ widely used in coding speech, image, and video

VQ widely used in coding speech, image, and video at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng

More information

1 Matrix representations of canonical matrices

1 Matrix representations of canonical matrices 1 Matrx representatons of canoncal matrces 2-d rotaton around the orgn: ( ) cos θ sn θ R 0 = sn θ cos θ 3-d rotaton around the x-axs: R x = 1 0 0 0 cos θ sn θ 0 sn θ cos θ 3-d rotaton around the y-axs:

More information

Solving Nonlinear Differential Equations by a Neural Network Method

Solving Nonlinear Differential Equations by a Neural Network Method Solvng Nonlnear Dfferental Equatons by a Neural Network Method Luce P. Aarts and Peter Van der Veer Delft Unversty of Technology, Faculty of Cvlengneerng and Geoscences, Secton of Cvlengneerng Informatcs,

More information

1 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations

1 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations Physcs 171/271 -Davd Klenfeld - Fall 2005 (revsed Wnter 2011) 1 Dervaton of Rate Equatons from Sngle-Cell Conductance (Hodgkn-Huxley-lke) Equatons We consder a network of many neurons, each of whch obeys

More information

Global Sensitivity. Tuesday 20 th February, 2018

Global Sensitivity. Tuesday 20 th February, 2018 Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values

More information

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU Group M D L M Chapter Bayesan Decson heory Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty Bayesan Decson heory Bayesan decson theory s a statstcal approach to data mnng/pattern

More information

Lecture 7: Boltzmann distribution & Thermodynamics of mixing

Lecture 7: Boltzmann distribution & Thermodynamics of mixing Prof. Tbbtt Lecture 7 etworks & Gels Lecture 7: Boltzmann dstrbuton & Thermodynamcs of mxng 1 Suggested readng Prof. Mark W. Tbbtt ETH Zürch 13 März 018 Molecular Drvng Forces Dll and Bromberg: Chapters

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

Difference Equations

Difference Equations Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1

More information

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018 INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton

More information

Week3, Chapter 4. Position and Displacement. Motion in Two Dimensions. Instantaneous Velocity. Average Velocity

Week3, Chapter 4. Position and Displacement. Motion in Two Dimensions. Instantaneous Velocity. Average Velocity Week3, Chapter 4 Moton n Two Dmensons Lecture Quz A partcle confned to moton along the x axs moves wth constant acceleraton from x =.0 m to x = 8.0 m durng a 1-s tme nterval. The velocty of the partcle

More information

Chapter 11: Simple Linear Regression and Correlation

Chapter 11: Simple Linear Regression and Correlation Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors Stat60: Bayesan Modelng and Inference Lecture Date: February, 00 Reference Prors Lecturer: Mchael I. Jordan Scrbe: Steven Troxler and Wayne Lee In ths lecture, we assume that θ R; n hgher-dmensons, reference

More information

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS Avalable onlne at http://sck.org J. Math. Comput. Sc. 3 (3), No., 6-3 ISSN: 97-537 COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

More information

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering / Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons

More information

U-Pb Geochronology Practical: Background

U-Pb Geochronology Practical: Background U-Pb Geochronology Practcal: Background Basc Concepts: accuracy: measure of the dfference between an expermental measurement and the true value precson: measure of the reproducblty of the expermental result

More information

Learning Theory: Lecture Notes

Learning Theory: Lecture Notes Learnng Theory: Lecture Notes Lecturer: Kamalka Chaudhur Scrbe: Qush Wang October 27, 2012 1 The Agnostc PAC Model Recall that one of the constrants of the PAC model s that the data dstrbuton has to be

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.

More information

Feb 14: Spatial analysis of data fields

Feb 14: Spatial analysis of data fields Feb 4: Spatal analyss of data felds Mappng rregularly sampled data onto a regular grd Many analyss technques for geophyscal data requre the data be located at regular ntervals n space and/or tme. hs s

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS

A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS HCMC Unversty of Pedagogy Thong Nguyen Huu et al. A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS Thong Nguyen Huu and Hao Tran Van Department of mathematcs-nformaton,

More information

Prof. Dr. I. Nasser Phys 630, T Aug-15 One_dimensional_Ising_Model

Prof. Dr. I. Nasser Phys 630, T Aug-15 One_dimensional_Ising_Model EXACT OE-DIMESIOAL ISIG MODEL The one-dmensonal Isng model conssts of a chan of spns, each spn nteractng only wth ts two nearest neghbors. The smple Isng problem n one dmenson can be solved drectly n several

More information

Research Article Green s Theorem for Sign Data

Research Article Green s Theorem for Sign Data Internatonal Scholarly Research Network ISRN Appled Mathematcs Volume 2012, Artcle ID 539359, 10 pages do:10.5402/2012/539359 Research Artcle Green s Theorem for Sgn Data Lous M. Houston The Unversty of

More information

Yong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 )

Yong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 ) Kangweon-Kyungk Math. Jour. 4 1996), No. 1, pp. 7 16 AN ITERATIVE ROW-ACTION METHOD FOR MULTICOMMODITY TRANSPORTATION PROBLEMS Yong Joon Ryang Abstract. The optmzaton problems wth quadratc constrants often

More information

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016 U.C. Berkeley CS94: Spectral Methods and Expanders Handout 8 Luca Trevsan February 7, 06 Lecture 8: Spectral Algorthms Wrap-up In whch we talk about even more generalzatons of Cheeger s nequaltes, and

More information

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE Analytcal soluton s usually not possble when exctaton vares arbtrarly wth tme or f the system s nonlnear. Such problems can be solved by numercal tmesteppng

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

The Second Anti-Mathima on Game Theory

The Second Anti-Mathima on Game Theory The Second Ant-Mathma on Game Theory Ath. Kehagas December 1 2006 1 Introducton In ths note we wll examne the noton of game equlbrum for three types of games 1. 2-player 2-acton zero-sum games 2. 2-player

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 89 Fall 206 Introducton to Machne Learnng Fnal Do not open the exam before you are nstructed to do so The exam s closed book, closed notes except your one-page cheat sheet Usage of electronc devces

More information

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models Computaton of Hgher Order Moments from Two Multnomal Overdsperson Lkelhood Models BY J. T. NEWCOMER, N. K. NEERCHAL Department of Mathematcs and Statstcs, Unversty of Maryland, Baltmore County, Baltmore,

More information

1 GSW Iterative Techniques for y = Ax

1 GSW Iterative Techniques for y = Ax 1 for y = A I m gong to cheat here. here are a lot of teratve technques that can be used to solve the general case of a set of smultaneous equatons (wrtten n the matr form as y = A), but ths chapter sn

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probablstc & Unsupervsed Learnng Convex Algorthms n Approxmate Inference Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computatonal Neuroscence Unt Unversty College London Term 1, Autumn 2008 Convexty A convex

More information

Lecture 4: Universal Hash Functions/Streaming Cont d

Lecture 4: Universal Hash Functions/Streaming Cont d CSE 5: Desgn and Analyss of Algorthms I Sprng 06 Lecture 4: Unversal Hash Functons/Streamng Cont d Lecturer: Shayan Oves Gharan Aprl 6th Scrbe: Jacob Schreber Dsclamer: These notes have not been subjected

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there

More information

ELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM

ELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM ELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM An elastc wave s a deformaton of the body that travels throughout the body n all drectons. We can examne the deformaton over a perod of tme by fxng our look

More information

Hidden Markov Models

Hidden Markov Models CM229S: Machne Learnng for Bonformatcs Lecture 12-05/05/2016 Hdden Markov Models Lecturer: Srram Sankararaman Scrbe: Akshay Dattatray Shnde Edted by: TBD 1 Introducton For a drected graph G we can wrte

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

A new construction of 3-separable matrices via an improved decoding of Macula s construction

A new construction of 3-separable matrices via an improved decoding of Macula s construction Dscrete Optmzaton 5 008 700 704 Contents lsts avalable at ScenceDrect Dscrete Optmzaton journal homepage: wwwelsevercom/locate/dsopt A new constructon of 3-separable matrces va an mproved decodng of Macula

More information

NON-CENTRAL 7-POINT FORMULA IN THE METHOD OF LINES FOR PARABOLIC AND BURGERS' EQUATIONS

NON-CENTRAL 7-POINT FORMULA IN THE METHOD OF LINES FOR PARABOLIC AND BURGERS' EQUATIONS IJRRAS 8 (3 September 011 www.arpapress.com/volumes/vol8issue3/ijrras_8_3_08.pdf NON-CENTRAL 7-POINT FORMULA IN THE METHOD OF LINES FOR PARABOLIC AND BURGERS' EQUATIONS H.O. Bakodah Dept. of Mathematc

More information

Maximizing the number of nonnegative subsets

Maximizing the number of nonnegative subsets Maxmzng the number of nonnegatve subsets Noga Alon Hao Huang December 1, 213 Abstract Gven a set of n real numbers, f the sum of elements of every subset of sze larger than k s negatve, what s the maxmum

More information

Second order approximations for probability models

Second order approximations for probability models Second order approxmatons for probablty models lbert Kappen Department of Bophyscs Njmegen Unversty Njmegen, the Netherlands bertmbfysunnl Wm Wegernc Department of Bophyscs Njmegen Unversty Njmegen, the

More information

Which Separator? Spring 1

Which Separator? Spring 1 Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal

More information

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30 STATS 306B: Unsupervsed Learnng Sprng 2014 Lecture 10 Aprl 30 Lecturer: Lester Mackey Scrbe: Joey Arthur, Rakesh Achanta 10.1 Factor Analyss 10.1.1 Recap Recall the factor analyss (FA) model for lnear

More information

More metrics on cartesian products

More metrics on cartesian products More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of

More information

Integrals and Invariants of Euler-Lagrange Equations

Integrals and Invariants of Euler-Lagrange Equations Lecture 16 Integrals and Invarants of Euler-Lagrange Equatons ME 256 at the Indan Insttute of Scence, Bengaluru Varatonal Methods and Structural Optmzaton G. K. Ananthasuresh Professor, Mechancal Engneerng,

More information

C/CS/Phy191 Problem Set 3 Solutions Out: Oct 1, 2008., where ( 00. ), so the overall state of the system is ) ( ( ( ( 00 ± 11 ), Φ ± = 1

C/CS/Phy191 Problem Set 3 Solutions Out: Oct 1, 2008., where ( 00. ), so the overall state of the system is ) ( ( ( ( 00 ± 11 ), Φ ± = 1 C/CS/Phy9 Problem Set 3 Solutons Out: Oct, 8 Suppose you have two qubts n some arbtrary entangled state ψ You apply the teleportaton protocol to each of the qubts separately What s the resultng state obtaned

More information