arxiv: v2 [cs.lg] 16 Sep 2009
|
|
- Shawn Russell
- 5 years ago
- Views:
Transcription
1 Mnmum Probablty Flow Learnng arxv: v2 [cs.lg] 6 Sep 29 Jascha Sohl-Dcksten ad, Peter Battaglno bd2 and Mchael R. DeWeese bcd3 a Bophyscs Graduate Group, b Department of Physcs, c Helen Wlls Neuroscence Insttute d Redwood Center for Theoretcal Neuroscence Unversty of Calforna, Berkeley, 9472 ascha@berkeley.edu, 2 pbb@berkeley.edu, 3 deweese@berkeley.edu, These authors contrbuted equally. Abstract Learnng n probablstc models s often hampered by the general ntractablty of the normalzaton factor and ts dervatves. Here we propose a new learnng technque that obvates the need to compute an ntractable normalzaton factor or sample from the equlbrum dstrbuton of the model. Ths s acheved by establshng dynamcs that would transform the observed data dstrbuton nto the model dstrbuton, and then settng as the obectve the mnmzaton of the ntal flow of probablty away from the data dstrbuton. Score matchng, mnmum velocty learnng, and certan forms of contrastve dvergence are shown to be specal cases of ths learnng technque. We demonstrate the applcaton of mnmum probablty flow learnng to parameter estmaton n Isng models, deep belef networks, multvarate Gaussan dstrbutons and a contnuous model wth a hghly general energy functon defned as a power seres. In the Isng model case, mnmum probablty flow learnng outperforms current state of the art technques by approxmately two orders of magntude n learnng tme, wth comparable error n recovered parameters. It s our hope that ths technque wll allevate exstng restrctons on the classes of probablstc models that are practcal for use. Introducton Estmatng parameters for probablstc models s a fundamental problem n many scentfc and engneerng dscplnes. Unfortunately, most probablstc learnng technques requre calculatng the normalzaton factor, or partton functon, of the probablstc model n queston, or at least calculatng ts gradent. For the overwhelmng maorty of models there are no known analytc solutons, confnng us to the hghly restrctve subset of probablstc models that can be analytcally solved, or those that can be made tractable usng known approxmate learnng technques. Thus, development of new technques for parameter estmaton n currently ntractable probablstc models has the potental to be of great beneft, lftng near ubqutous restrctons on how we are able to model the world. Many approaches exst for approxmate learnng, ncludng mean feld theory and ts expansons, varatonal Bayes technques and a plethora of samplng or numercal ntegraton based methods [22,, 9, 5]. Of partcular nterest are contrastve dvergence (CD), developed by Wellng, Hnton and Carrera-Perpñán [23, 4], Hyvärnen s score matchng (SM) [7], and the mnmum velocty learnng framework proposed by Movellan [4, 3, 5]. Contrastve dvergence [23, 4] s a varaton on steepest gradent descent of the maxmum (log) lkelhood (ML) obectve functon. Rather than ntegratng over the full model model dstrbuton, CD approxmates the partton functon term n the gradent by averagng over the dstrbuton real-
2 zed after takng a few Markov chan Monte Carlo (MCMC) steps away from the data dstrbuton. Qualtatvely, one can magne that the data dstrbuton s contrasted aganst a dstrbuton whch has evolved a small dstance towards the model dstrbuton, whereas t would usually be contrasted aganst the true model dstrbuton. Although CD s not guaranteed to converge to the rght answer, or even to a fxed pont, t has proven to be an effectve and fast heurstc for parameter estmaton [, 24]. Score matchng, developed by Aapo Hyvärnen [7], s a method that learns parameters n a probablstc model usng only dervatves of the energy functon evaluated over the data dstrbuton (see Equaton (2)). Ths sdesteps the need to explctly sample or ntegrate over the model dstrbuton. In score matchng one mnmzes the expected square dstance of the score functon wth respect to spatal coordnates gven by the data dstrbuton from the smlar score functon gven by the model dstrbuton. It can be seen as an ntegraton of the contrastve dvergence gradent for nfntesmal Langevn dynamcs [8], as the lmt of approxmatng the model dstrbuton by patchng together cutouts of the model dstrbuton around each data pont [2], and fnally as equvalent to mnmum velocty learnng [4]. Mnmum velocty learnng s an approach recently proposed by Movellan [4] that recasts a number of the deas behnd CD, treatng the mnmzaton of the ntal dynamcs away from the data dstrbuton as the goal tself rather than a surrogate for t. Movellan s proposal s that rather than drectly mnmze the dfference between the data and the model, one ntroduces system dynamcs that have the model as ther equlbrum dstrbuton, and mnmzes the ntal flow of probablty away from the data under those dynamcs. If the model looks exactly lke the data there wll be no flow of probablty, and f model and data are smlar the flow of probablty wll tend to be mnmal. Movellan apples ths ntuton to the specfc case of dstrbutons over contnuous state spaces evolvng va dffuson dynamcs. The velocty n mnmum velocty learnng s the dfference n average drft veloctes between partcles dffusng under the model dstrbuton and partcles dffusng under the data dstrbuton. Here we provde a framework, applcable to any parametrc model, of whch mnmum velocty, certan forms of CD, and SM are all specal cases, and whch s n many stuatons more powerful than any of these algorthms. Ths framework extends the deas behnd mnmum velocty learnng to arbtrary state spaces and a far broader class of dynamcs. We show that learnng under ths framework s effectve and fast n a number of cases: Isng models, deep belef networks (DBN), multdmensonal Gaussan dstrbutons, and a complcated two-dmensonal contnuous dstrbuton. 2 Mnmum probablty flow Our goal s to fnd the parameters that cause a probablstc model to best agree wth a set of (assumed d) observatons of the state of a system. We wll do ths by proposng dynamcs that guarantee the transformaton of the data dstrbuton nto the model dstrbuton, and then mnmzng the magntude of the ntal flow of probablty away from the data dstrbuton. 2. Dstrbutons The data dstrbuton s represented by a vector, wth the probablty of observng the system n a state. The superscrpt () represents tme t = under the system dynamcs, as wll The update rule for gradent descent of the negatve log lkelhood, or maxmum lkelhood obectve functon, s θ h P p() log p ( ) (θ) θ = X E (θ) θ + X E (θ) θ p ( ) (θ), where and p ( ) (θ) represent the data dstrbuton and model dstrbuton, respectvely, E (θ) s the energy functon assocated wth the model dstrbuton and ndexes the states of the system (see Secton 2.). The second term n ths gradent can be extremely dffcult to compute (costng n general an amount of tme exponental n the dmensonalty of the system). Under contrastve dvergence p ( ) (θ) s replaced by samples only a few Monte Carlo steps away from the data. 2
3 data dstrbuton dynamcs model dstrbuton ṗ () = = data p ( ) Γ (θ) ṗ (t) = Γ (θ) (θ) (θ) = e E(θ) Z (θ) ṗ ( ) = Fgure : Dynamcs of mnmum probablty flow learnng. Model dynamcs represented by the probablty flow matrx Γ (mddle) determne how probablty flows from the emprcal hstogram of the sample data ponts (left) to the equlbrum dstrbuton of the model (rght) after a suffcently long tme. In ths example there are only four possble states for the system, whch conssts of a par of bnary varables, and the partcular model parameters favor state whereas the data falls mostly on other states. be descrbed n more detal n Secton 2.2. If the observatons were of a two varable bnary system, then would have four entres representng the probabltes of observng states,, and. Our goal s to fnd the parameters θ that cause a model dstrbuton p ( ) (θ) to best match the data dstrbuton. Wthout loss of generalty, we assume the model dstrbuton to be of the form p ( ) (θ) = exp ( E (θ)), () Z (θ) where E (θ) s referred to as the energy functon, and the normalzng factor Z (θ) s called the partton functon, Z (θ) = exp ( E (θ)) (2) (here we have set the temperature of the system to ). The superscrpt ( ) ndcates that ths s the equlbrum dstrbuton reached after runnng the dynamcs for nfnte tme. 2.2 Dynamcs We wsh to generalze Movellan s dffuson dynamcs to arbtrary state spaces. To accomplsh ths, we observe that dffuson dynamcs are a specal case of dynamcs governed by a master equaton that enforces conservaton of probablty [6]: ṗ (t) = Γ (θ) Γ (θ), (3) where ṗ (t) = p(t) t s the rate of change of probablty of state wth tme. Transton rates Γ (θ) gve the rate at whch probablty wll flow from a state nto a state. The frst term of Equaton (3) represents flow of probablty out of other states nto the state, and the second represents flow out of nto other states. The dependence on θ results from the requrement that the dynamcs we choose cause to flow to the equlbrum dstrbuton p ( ) (θ). For readablty, explct dependence on θ wll be dropped except where specfcally relevant. If we choose the dagonal of Γ to obey Γ = Γ, then we can wrte the dynamcs as ṗ (t) = Γ (4) 3
4 (see Fgure ). The unque soluton for s Detaled Balance = exp (Γt). (5) Γ must be chosen such that the dynamcs n Equaton (4) converge to the model dstrbuton. One way to guarantee ths s by choosng Γ such that t satsfes detaled balance for the model dstrbuton p ( ), and such that there s a path through Γ allowng mxng between any two states. Note that there s no need to restrct the dynamcs defned by Γ to those of any real physcal process, such as dffuson. Detaled balance requres that at equlbrum the probablty flow from state nto state equals the probablty flow from nto, whch can be rewrtten as Γ Γ p ( ) (θ) = Γ p ( ) (θ), (6) = p( ) (θ) Γ p ( ) (θ) = exp [E (θ) E (θ)]. (7) Γ s underconstraned by the above equaton. Motvated by symmetry and aesthetcs, we choose as the form for the (non-zero, non-dagonal) entres n Γ [ ] Γ = exp 2 (E (θ) E (θ)) ( ). (8) The choce Γ = Γ = also satsfes Equaton (6), allowng a sparse populaton of Γ for purposes of computatonal tractablty. Theoretcally, to guarantee convergence to the model dstrbuton, the non-zero elements of Γ must be chosen such that, gven suffcent tme, probablty can flow between any par of states. In practce, we wll only need to consder a small fracton of the non-zero elements n Γ (see Secton 2.5). 2.4 Obectve Functon The goal s to mnmze the ntal flow of probablty away from the data dstrbuton (Fgure 2). Although other obectve functons are possble for a mnmum probablty flow approach, we have found the L norm to be partcularly effectve: ˆθ = arg mn K (θ), (9) θ K (θ) = ṗ () (θ) = ṗ () (θ). () Ths obectve functon s unquely zero when and p ( ) (θ) are exactly equal (although n general the relatonshp of ˆθ to the maxmum lkelhood soluton s less clear). Some algebra gves the learnng gradent wth respect to θ: K θ = 2, Γ (θ) [ E (θ) θ E ] (θ) [ ( sgn θ ṗ () (θ) ) ( sgn ṗ () )] (θ). () Note that Equatons (9) through () do not depend on the partton functon Z (θ) or ts dervatves. Under the constrant that Γ does not allow probablty to flow drectly from one state wth data to another - nearly always satsfed when the number of system states s much larger than the number of states wth data - Equaton (9) s equvalent to mnmzng the ntal rate of growth of the KL dvergence between and, D KL( ) t t= (see Appendx A). Under the same constrant, the mnmum probablty flow obectve functon K (θ) s convex for all models p ( ) (θ) n the exponental famly - that s, models who s energy functon E (θ) s lnear n ther parameters θ [2] (see Appendx B). 2 The form chosen for Γ n Equaton (4), coupled wth the satsfacton of detaled balance and ergodcty ntroduced n secton 2.3, guarantees that there s a unque egenvector p ( ) of Γ wth egenvalue zero, and that all other egenvalues of Γ have negatve real parts. 4
5 a b p ( ) (θ) c p () (θ) d. (θ) States of the system Fgure 2: An llustraton of the mnmum probablty flow obectve functon, whch mnmzes the ntal flow of probablty away from the data. a. Emprcal hstogram of the observed data over all possble states of the system. b. Model dstrbuton that the dynamcs would converge to f allowed to run for a suffcently long tme. The dynamcs and model dstrbuton are both functons of the model parameters (θ). Our goal s to make the model, or equlbrum, dstrbuton as much lke the data dstrbuton as possble. c. The dstrbuton after startng at the data and runnng the dynamcs for a short tme perod. d. The temporal dervatve of the probablty dstrbuton, or probablty flow, at t =. Learnng s acheved by changng the model parameters so as to mnmze the shaded regon of ths graph. 2.5 Tractablty The vector s typcally huge, as s Γ (e.g., 2 N and 2 N 2 N, respectvely, for an N-bt bnary system). Naïvely, ths would seem to prohbt evaluaton and mnmzaton of the obectve functon. Fortunately, all the elements n not correspondng to observatons are zero. Snce our obectve functon s only evaluated at tme t = ths allows us to gnore all those Γ for whch no data pont exsts at state. Addtonally, there s a great deal of flexblty as far as whch elements of Γ can be set to zero. By populatng Γ so as to connect each state to a small fxed number of addtonal states, the cost of the algorthm n both memory and tme s O(M), where M s the number of observed data ponts, and does not depend on the number of system states. 2.6 Contnuous Systems Although we have motvated ths technque usng systems wth a large, but fnte, number of states, t generalzes n a straghtforward manner to contnuous systems. The flow matrx Γ and dstrbuton vectors transton from beng very large to beng nfnte n sze. Γ can stll be chosen to connect each state to a small, fnte, number of addtonal states however, and only outgong probablty flow from states wth data contrbutes to the obectve functon, so the cost of learnng remans largely unchanged. In addton, for a partcular pattern of connectvty n Γ ths obectve functon, lke Movellan s [4], reduces to score matchng [7] (other connectvty patterns reduce to alternate forms). Takng the lmt of connectons between all states wthn a small dstance ɛ of each other, and then Taylor expandng n ɛ, one can show that, up to an overall constant and scalng factor K K SM = {samples} [ ] 2 E(x ) E(x ) 2 E(x ). (2) 5
6 Mean absolute correlaton error unt Isng model Tme (sec) Mean absolute correlaton error unt Isng model 5 5 Tme (sec) Fgure 3: A demonstraton of rapd fttng of the Isng model by mnmum probablty flow learnng. The mean absolute error n the learned model s correlaton matrx s shown as a functons of learnng tme for 4 and unt fully connected Isng models. Convergence s reached n about 5 seconds for 2, samples from the 4 unt model (left) and n about mnute for, samples from the unt model (rght). Detals of the unt model can be seen n Fgure 4. J J new J J new C C new C C new Fgure 4: An example unt Isng model ft usng mnmum probablty flow learnng. (left) Randomly chosen Gaussan couplng matrx J (top) wth varance.4 and assocated correlaton matrx C (bottom) for a unt, fully-connected Isng model. The dagonal has been removed from the correlaton matrx C for ncreased vsblty. (center) The recovered couplng and correlaton matrces after mnmum probablty flow learnng on, samples from the model n the left panels. (rght) The error n recovery of the couplng and correlaton matrces. Ths reproduces the lnk dscovered by Movellan [4] between dffuson dynamcs over contnuous spaces and score matchng. 3 Expermental Results Matlab code mplementng mnmum probablty flow learnng for each of the followng cases s avalable upon request. A publc toolkt s under constructon. All mnmzaton was performed usng Mark Schmdt s remarkably effectve mnfunc [7]. 3. Isng model The Isng model has a long and stored hstory n physcs [3] and machne learnng [] and t has recently been found to be a surprsngly useful model for networks of neurons n the retna [8, 2]. The ablty to ft Isng models to the actvty of large groups of smultaneously recorded neurons s 6
7 2 unts 2 unts 2 unts 2 unts 28x28 pxels Fgure 5: A deep belef network traned usng mnmum probablty flow learnng and contrastve dvergence. (left) A four layer deep belef network was traned on the MNIST postal hand wrtten dgts dataset. (center) Confabulatons after tranng va mnmum probablty flow learnng. A reasonable probablstc model for handwrtten dgts has been learned. (rght) Confabulatons after tranng va sngle step CD. Note the uneven dstrbuton of dgt occurrences. of current nterest gven the ncreasng number of these types of data sets from the retna, cortex and other bran structures. We ft an Isng model (fully vsble Boltzmann machne) of the form p ( ) (x; J) = Z(J) exp J x x (3), to a set of N d-element d data samples { x () =...N } generated va Gbbs samplng from an Isng model as descrbed below, where each of the d elements of x s ether or. Because each x {, }, x 2 = x, we can wrte the energy functon as E(x; J) = J x. (4), J x x + The probablty flow matrx Γ has 2 N 2 N elements, but we allow only elements correspondng to transtons nto states a sngle bt-flp away to be non-zero. Fgure 3 shows the average error n predcted correlatons as a functon of learnng tme for 2, samples from a 4 unt, fully connected Isng model. The J used were gracously provded by Broderck and coauthors, and were dentcal to those used for synthetc data generaton n the 28 paper Faster solutons of the nverse parwse Isng problem [2]. Tranng was performed on 2, samples so as to match the number of samples used n secton III.A. of Broderck et al. Note that gven suffcent samples, the mnmum probablty flow algorthm would converge exactly to the rght answer, as learnng n the Isng model s convex (Appendx B), and has ts global mnmum at the true soluton. On an 8 core 2.33 GHz Intel Xeon, the learnng converges n about 5 seconds. Broderck et al. perform a smlar learnng task on a -CPU grd computng cluster, wth a convergence tme of approxmately 2 seconds. Smlar learnng was performed for, samples from a unt, fully connected, Isng model. A couplng matrx was chosen wth elements randomly drawn from a Gaussan wth mean and varance.4. Usng the mnmum probablty flow learnng technque, learnng took approxmately mnute, compared to roughly 2 hours for a unt (nearest neghbor couplng only) model of retnal data [9] (personal communcaton, J. Shlens). Fgure 4 demonstrates the recovery of the couplng and correlaton matrces for our fully connected Isng model, whle Fgure 3 shows the tme course for learnng. 3.2 Deep Belef Network As a demonstraton of learnng on a more complex dscrete valued model, we traned a 4 layer deep belef network (DBN) [6] on MNIST handwrtten dgts. A DBN conssts of stacked restrcted Boltzmann machnes (RBMs), such that the hdden layer of one RBM forms the vsble layer of the 7
8 Fgure 6: A contnuous state space model ft usng mnmum probablty flow learnng. (left) Randomly chosen couplng matrx Σ and assocated covarance matrx Σ for a dmensonal Gaussan dstrbuton. (center) The recovered couplng matrx Σ new and assocated covarance matrx Σ new after mnmum probablty flow learnng on, samples from the model n (left). (rght) The error n recovery of the couplng and covarance matrces. next. Each RBM has the form: p ( ) (x vs, x hd ; W) = p ( ) (x vs ; W) = Z(W) exp W x vs, x hd,, (5), Z(W) exp ( [ log + exp ]) W x vs,. (6) Note that samplng-free applcaton of the mnmum probablty flow algorthm requres analytcally margnalzng over the hdden unts. RBMs were traned n sequence, startng at the bottom layer, on, samples from the MNIST postal hand wrtten dgts data set. As n the Isng case, the probablty flow matrx Γ was populated so as to connect every state to all states whch dffered by only a sngle bt flp. Tranng was performed by both mnmum probablty flow and sngle step CD to allow a smple comparson of the two technques (note that CD turns nto full ML learnng as the number of steps s ncreased, and that the qualty of the CD answer can thus be mproved at the cost of computatonal tme by usng many-step CD). Confabulatons were performed by Gbbs samplng from the top layer RBM, then propagatng each sample back down to the pxel layer by way of the condtonal dstrbuton p ( ) (x vs x hd ; W k ) for each of the ntermedary RBMs, where k ndexes the layer n the stack. As shown n Fgure 5, mnmum probablty flow learned a good model of handwrtten dgts. 3.3 Gaussan As an example of mnmum probablty flow learnng appled to contnuous models, we ft a multvarate Gaussan dstrbuton to synthetc data. The model dstrbuton has the form p ( ) (x; Σ ) = [ Z (Σ ) exp ] 2 xt Σ x, (7) wth vector x and couplng matrx Σ. We ft to, d samples from a -dmensonal Gaussan dstrbuton. The probablty flow matrx Γ was populated so as to connect every state to 2 addtonal states, chosen from a Gaussan dstrbuton wth varance. centered on the state. Results are shown n Fgure 6. 8
9 Fgure 7: A hghly unconstraned, dffcult to normalze, model ft usng mnmum probablty flow learnng. (left) Hstogram of a complcated two-dmensonal contnuous dstrbuton (x, y), (x, y) [, ] 2. The probablty of observng a sample (x, y) s proportonal to the pxel value at locaton (x, y) n the hstogram. Note, the mage represents a dstrbuton over (x, y) values, not a sample from a dstrbuton. (center) Scatter plot of, samples drawn from the dstrbuton n (left) (rght) Hstogram of learned dstrbuton p ( ) (x, y; θ) traned n batch( mode on groups of, samples from the dstrbuton n (left), where p ( ) (x, y; θ) exp 28 ) 28 m= n= θ mnl m (x) L n (y) and L m (x) s the mth Legendre polynomal n x. 3.4 Power Seres Energy Functon To demonstrate mnmum probablty flow s effectveness n an extremely flexble, dffcult to normalze, model, we learned parameters θ for a two-dmensonal contnuous dstrbuton of the form [ ] p ( ) (x, y; θ) = M Z (θ) exp θ mn L m (x)l n (y), (8) m,n= where L m (x) s the mth order Legendre polynomal n x, (x, y) [, ] 2, M s the maxmum polynomal order, and Z (θ) s the normalzaton factor. We ft an M = 28 dstrbuton usng consecutve lne searches on batches of, d samples from the dstrbuton shown on the left of Fgure 7. The probablty flow matrx Γ was populated so as to connect every state wth 2 other states, chosen from a unform dstrbuton n the range [, ] 2. Fgure 7 shows a hstogram of the data dstrbuton (x, y; θ) compared to a hstogram of the learned Legendre functon expanson p ( ) (x, y; θ). 4 Summary We have presented a novel framework for effcent learnng n the context of any parametrc model. Ths method was nspred by the mnmum velocty approach developed by Movellan, and t reduces to that technque as well as to score matchng and some forms of contrastve dvergence under sutable choces for the dynamcs and state space. By decouplng the dynamcs from any specfc physcal process, such as dffuson, and focusng on the ntal flow of probablty from the data to a subset of other states chosen n part for ther utlty and convenence, we have arrved at a framework that s not only more general than prevous approaches, but also potentally much more powerful. We expect that ths framework wll render some prevously ntractble models more amenable to estmaton. Acknowledgments We would lke to thank Javer Movellan for sharng a work n progress; Tamara Broderck, Mroslav Dudík, Gašper Tkačk, Robert E. Schapre and Wllam Balek for use of ther Isng model couplng parameters; Jonathon Shlens for useful dscusson and ground truth for hs Isng model convergence tmes; Bruno Olshausen, Anthony Bell, Chrstopher Hllar, Charles Cadeu, Klan Koepsell and the 9
10 rest of the Redwood Center for many useful dscussons and for comments on earler versons of the manuscrpt; Ashvn Vshwanath for useful dscusson; and the Canadan Insttute for Advanced Research - Neural Computaton and Percepton Program for ther fnancal support (JSD). APPENDICES A Connecton to KL Dvergence We want to measure the rate of growth of the KL dvergence at tme, D KL t t=. The KL dvergence between the data dstrbuton, and the dstrbuton resultng after runnng the dynamcs for a tme t, s D KL = log log. (A-) Note that the terms for whch = wll never contrbute to ths sum. To make ths explct, we rewrte the sum as beng over the set of states whch are non-zero n. Ths set s { } D = :. (A-2) We also note the complement of ths set, Ths makes the KL dvergence D KL = D D C = { } : =. (A-3) log log. (A-4) D The dervatve s D KL t = D = D = D = D D D δ D δ t δ [ Γ [ Γ Γ Γ ] Γ ] Γ + D D () p D C () p D C Γ Γ. (A-5) (A-6) (A-7) (A-8) In the last lne the sum over all has been broken nto a sum over D and ts complement D C. We evaluate the dervatve at t = D KL t= = δ Γ + Γ (A-9) t D D D D C δ Γ Γ. D D D D C We can smplfy ths by notng that the followng terms are : δ Γ δ Γ = (A-) D D D D Γ = (A-) D D C
11 Ths means that the rate of growth of the KL dvergence at the data dstrbuton, t =, s D KL = Γ t=. (A-2) t D D C That s, the rate of growth of the KL dvergence s equal to the rate of probablty flow from states wth data to those wthout. Ths s equvalent to the mnmum probablty flow L obectve functon n the usual case that Γ does not allow probablty to flow drectly from one state wth data to another. B Convexty As observed by Macke and Gerwnn [2], Equaton (A-2) s convex for models n the exponental famly. We wsh to mnmze K has dervatve and Hessan K = D K = θ m θ m D D c = 2 2 K = θ m θ n D C Γ. (B-) ( Γ Γ D D c Γ D D c Γ D D c ) ( E θ m E θ m ( E E θ m θ m ( 2 E θ m θ n (B-2) ), (B-3) ) ( E E θ n θ n 2 E θ m θ n ) (B-4) ). (B-5) The frst term s a weghted sum of outer products, wth non-negatve weghts 4 Γ, and s thus postve semdefnte. The second term s for models n the exponental famly (those wth energy functons lnear n ther parameters). Parameter estmaton for models n the exponental famly s therefore convex usng mnmum probablty flow learnng, n the commonly satsfed lmt that Γ does not drectly connect any two data ponts. References [] D H Ackley, G E Hnton, and T J Senowsk. A learnng algorthm for Boltzmann machnes. Cogntve Scence, 9(2):47 69, 985. [2] T Broderck, M Dudík, G Tkačk, R Schapre, and W Balek. Faster solutons of the nverse parwse Isng problem. E-prnt arxv, Jan 27. [3] S G Brush. Hstory of the Lenz-Isng model. Revews of Modern Physcs, 39(4): , Oct 967. [4] M A Carrera-Perpñán and G E Hnton. On contrastve dvergence (CD) learnng. Techncal report, Dept. of Computere Scence, Unversty of Toronto, 24. [5] S Haykn. Neural networks and learnng machnes; 3rd edton. Prentce Hall, 28. [6] Geoffrey E Hnton, Smon Osndero, and Yee-Whye Teh. A fast learnng algorthm for deep belef nets. Neural Computaton, 8(7): , Jul 26. [7] A Hyvärnen. Estmaton of non-normalzed statstcal models usng score matchng. Journal of Machne Learnng Research, 6:695 79, 25.
12 [8] A Hyvärnen. Connectons between score matchng, contrastve dvergence, and pseudolkelhood for contnuous-valued varables. IEEE Transactons on Neural Networks, Jan 27. [9] T Jaakkola and M Jordan. A varatonal approach to Bayesan logstc regresson models and ther extensons. Proceedngs of the Sxth Internatonal Workshop on Artfcal Intellgence and Statstcs, Jan 997. [] H Kappen and F Rodríguez. Mean feld approach to learnng n Boltzmann machnes. Pattern Recognton Letters, Jan 997. [] D MacKay. Falures of the one-step learnng algorthm. Jan 2. [2] J Macke and S Gerwnn. Personal communcaton. 29. [3] J R Movellan. Contrastve dvergence n Gaussan dffusons. Neural Computaton, 2(9): , 28. [4] J R Movellan. A mnmum velocty approach to learnng. unpublshed draft, Jan 28. [5] J R Movellan and J L McClelland. Learnng contnuous probablty dstrbutons wth symmetrc dffuson networks. Cogntve Scence, 7: , 993. [6] R Pathra. Statstcal Mechancs. Butterworth Henemann, Jan 972. [7] M Schmdt. mnfunc. schmdtm/software/mnfunc.html, 25. [8] E Schnedman, M J Berry 2nd, R Segev, and W Balek. Weak parwse correlatons mply strongly correlated network states n a neural populaton. Nature, 44(787):7 2, 26. [9] J Shlens, G D Feld, J L Gauther, M Greschner, A Sher, A M Ltke, and E J Chchlnsky. The structure of large-scale synchronzed frng n prmate retna. Journal of Neuroscence, 29(5):522 53, Apr 29. [2] J Shlens, G D Feld, J L Gauther, M I Grvch, D Petrusca, A Sher, A M Ltke, and E J Chchlnsky. The structure of mult-neuron frng patterns n prmate retna. J. Neurosc., 26(32): , 26. [2] J Sohl-Dcksten and B Olshausen. A spatal dervaton of score matchng. Redwood Center Techncal Report, 29. [22] T Tanaka. Mean-feld theory of Boltzmann machne learnng. Physcal Revew Letters E, Jan 998. [23] M Wellng and G Hnton. A new learnng algorthm for mean feld Boltzmann machnes. Lecture Notes n Computer Scence, Jan 22. [24] A Yulle. The convergence of contrastve dvergences. Department of Statstcs, UCLA. Department of Statstcs Papers., 25. 2
Minimum Probability Flow Learning
Mnmum Probablty Flow Learnng Jascha Sohl-Dcksten ab jascha@berkeley.edu Peter Battaglno ac pbb@berkeley.edu Mchael R. DeWeese acd deweese@berkeley.edu a Redwood Center for Theoretcal Neuroscence, b Bophyscs
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More informationCSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing
CSC321 Tutoral 9: Revew of Boltzmann machnes and smulated annealng (Sldes based on Lecture 16-18 and selected readngs) Yue L Emal: yuel@cs.toronto.edu Wed 11-12 March 19 Fr 10-11 March 21 Outlne Boltzmann
More informationConjugacy and the Exponential Family
CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the
More informationHidden Markov Models & The Multivariate Gaussian (10/26/04)
CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationLecture 10 Support Vector Machines II
Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed
More informationMLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012
MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:
More informationMarkov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement
Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs
More informationFeature Selection: Part 1
CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?
More informationSupporting Information
Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationLecture 12: Discrete Laplacian
Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly
More informationSingular Value Decomposition: Theory and Applications
Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real
More informationParametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010
Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton
More information8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF
10-708: Probablstc Graphcal Models 10-708, Sprng 2014 8 : Learnng n Fully Observed Markov Networks Lecturer: Erc P. Xng Scrbes: Meng Song, L Zhou 1 Why We Need to Learn Undrected Graphcal Models In the
More informationNatural Images, Gaussian Mixtures and Dead Leaves Supplementary Material
Natural Images, Gaussan Mxtures and Dead Leaves Supplementary Materal Danel Zoran Interdscplnary Center for Neural Computaton Hebrew Unversty of Jerusalem Israel http://www.cs.huj.ac.l/ danez Yar Wess
More informationLecture 12: Classification
Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna
More informationThe Geometry of Logit and Probit
The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.
More informationLinear Approximation with Regularization and Moving Least Squares
Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...
More informationWeek 5: Neural Networks
Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple
More informationResource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud
Resource Allocaton wth a Budget Constrant for Computng Independent Tasks n the Cloud Wemng Sh and Bo Hong School of Electrcal and Computer Engneerng Georga Insttute of Technology, USA 2nd IEEE Internatonal
More informationCSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography
CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve
More informationComposite Hypotheses testing
Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter
More informationU.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017
U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that
More informationHomework Assignment 3 Due in class, Thursday October 15
Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.
More informationConvergence of random processes
DS-GA 12 Lecture notes 6 Fall 216 Convergence of random processes 1 Introducton In these notes we study convergence of dscrete random processes. Ths allows to characterze phenomena such as the law of large
More informationNumerical Heat and Mass Transfer
Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and
More information1 Convex Optimization
Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,
More informationFor now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.
Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson
More information3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X
Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number
More informationChapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems
Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons
More informationHopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen
Hopfeld networks and Boltzmann machnes Geoffrey Hnton et al. Presented by Tambet Matsen 18.11.2014 Hopfeld network Bnary unts Symmetrcal connectons http://www.nnwj.de/hopfeld-net.html Energy functon The
More informationModule 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:
More informationThe equation of motion of a dynamical system is given by a set of differential equations. That is (1)
Dynamcal Systems Many engneerng and natural systems are dynamcal systems. For example a pendulum s a dynamcal system. State l The state of the dynamcal system specfes t condtons. For a pendulum n the absence
More informationNUMERICAL DIFFERENTIATION
NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the
More informationPhysics 5153 Classical Mechanics. D Alembert s Principle and The Lagrangian-1
P. Guterrez Physcs 5153 Classcal Mechancs D Alembert s Prncple and The Lagrangan 1 Introducton The prncple of vrtual work provdes a method of solvng problems of statc equlbrum wthout havng to consder the
More informationAppendix B. The Finite Difference Scheme
140 APPENDIXES Appendx B. The Fnte Dfference Scheme In ths appendx we present numercal technques whch are used to approxmate solutons of system 3.1 3.3. A comprehensve treatment of theoretcal and mplementaton
More informationUsing deep belief network modelling to characterize differences in brain morphometry in schizophrenia
Usng deep belef network modellng to characterze dfferences n bran morphometry n schzophrena Walter H. L. Pnaya * a ; Ary Gadelha b ; Orla M. Doyle c ; Crstano Noto b ; André Zugman d ; Qurno Cordero b,
More informationModule 9. Lecture 6. Duality in Assignment Problems
Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept
More informationUsing T.O.M to Estimate Parameter of distributions that have not Single Exponential Family
IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More informationTracking with Kalman Filter
Trackng wth Kalman Flter Scott T. Acton Vrgna Image and Vdeo Analyss (VIVA), Charles L. Brown Department of Electrcal and Computer Engneerng Department of Bomedcal Engneerng Unversty of Vrgna, Charlottesvlle,
More informationGaussian Mixture Models
Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous
More informationMarkov Chain Monte Carlo Lecture 6
where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways
More informationInformation Geometry of Gibbs Sampler
Informaton Geometry of Gbbs Sampler Kazuya Takabatake Neuroscence Research Insttute AIST Central 2, Umezono 1-1-1, Tsukuba JAPAN 305-8568 k.takabatake@ast.go.jp Abstract: - Ths paper shows some nformaton
More informationEEE 241: Linear Systems
EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they
More informationChapter 9: Statistical Inference and the Relationship between Two Variables
Chapter 9: Statstcal Inference and the Relatonshp between Two Varables Key Words The Regresson Model The Sample Regresson Equaton The Pearson Correlaton Coeffcent Learnng Outcomes After studyng ths chapter,
More information4DVAR, according to the name, is a four-dimensional variational method.
4D-Varatonal Data Assmlaton (4D-Var) 4DVAR, accordng to the name, s a four-dmensonal varatonal method. 4D-Var s actually a drect generalzaton of 3D-Var to handle observatons that are dstrbuted n tme. The
More informationCIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M
CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute
More informationAppendix B: Resampling Algorithms
407 Appendx B: Resamplng Algorthms A common problem of all partcle flters s the degeneracy of weghts, whch conssts of the unbounded ncrease of the varance of the mportance weghts ω [ ] of the partcles
More informationLINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity
LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have
More informationVQ widely used in coding speech, image, and video
at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng
More information1 Matrix representations of canonical matrices
1 Matrx representatons of canoncal matrces 2-d rotaton around the orgn: ( ) cos θ sn θ R 0 = sn θ cos θ 3-d rotaton around the x-axs: R x = 1 0 0 0 cos θ sn θ 0 sn θ cos θ 3-d rotaton around the y-axs:
More informationSolving Nonlinear Differential Equations by a Neural Network Method
Solvng Nonlnear Dfferental Equatons by a Neural Network Method Luce P. Aarts and Peter Van der Veer Delft Unversty of Technology, Faculty of Cvlengneerng and Geoscences, Secton of Cvlengneerng Informatcs,
More information1 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations
Physcs 171/271 -Davd Klenfeld - Fall 2005 (revsed Wnter 2011) 1 Dervaton of Rate Equatons from Sngle-Cell Conductance (Hodgkn-Huxley-lke) Equatons We consder a network of many neurons, each of whch obeys
More informationGlobal Sensitivity. Tuesday 20 th February, 2018
Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values
More informationMIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU
Group M D L M Chapter Bayesan Decson heory Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty Bayesan Decson heory Bayesan decson theory s a statstcal approach to data mnng/pattern
More informationLecture 7: Boltzmann distribution & Thermodynamics of mixing
Prof. Tbbtt Lecture 7 etworks & Gels Lecture 7: Boltzmann dstrbuton & Thermodynamcs of mxng 1 Suggested readng Prof. Mark W. Tbbtt ETH Zürch 13 März 018 Molecular Drvng Forces Dll and Bromberg: Chapters
More informationEcon107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)
I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes
More informationDifference Equations
Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1
More informationINF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018
INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton
More informationWeek3, Chapter 4. Position and Displacement. Motion in Two Dimensions. Instantaneous Velocity. Average Velocity
Week3, Chapter 4 Moton n Two Dmensons Lecture Quz A partcle confned to moton along the x axs moves wth constant acceleraton from x =.0 m to x = 8.0 m durng a 1-s tme nterval. The velocty of the partcle
More informationChapter 11: Simple Linear Regression and Correlation
Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests
More informationStat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors
Stat60: Bayesan Modelng and Inference Lecture Date: February, 00 Reference Prors Lecturer: Mchael I. Jordan Scrbe: Steven Troxler and Wayne Lee In ths lecture, we assume that θ R; n hgher-dmensons, reference
More informationCOMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS
Avalable onlne at http://sck.org J. Math. Comput. Sc. 3 (3), No., 6-3 ISSN: 97-537 COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS
More informationP R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /
Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons
More informationU-Pb Geochronology Practical: Background
U-Pb Geochronology Practcal: Background Basc Concepts: accuracy: measure of the dfference between an expermental measurement and the true value precson: measure of the reproducblty of the expermental result
More informationLearning Theory: Lecture Notes
Learnng Theory: Lecture Notes Lecturer: Kamalka Chaudhur Scrbe: Qush Wang October 27, 2012 1 The Agnostc PAC Model Recall that one of the constrants of the PAC model s that the data dstrbuton has to be
More informationThe Expectation-Maximization Algorithm
The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.
More informationFeb 14: Spatial analysis of data fields
Feb 4: Spatal analyss of data felds Mappng rregularly sampled data onto a regular grd Many analyss technques for geophyscal data requre the data be located at regular ntervals n space and/or tme. hs s
More informationErrors for Linear Systems
Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch
More informationA PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS
HCMC Unversty of Pedagogy Thong Nguyen Huu et al. A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS Thong Nguyen Huu and Hao Tran Van Department of mathematcs-nformaton,
More informationProf. Dr. I. Nasser Phys 630, T Aug-15 One_dimensional_Ising_Model
EXACT OE-DIMESIOAL ISIG MODEL The one-dmensonal Isng model conssts of a chan of spns, each spn nteractng only wth ts two nearest neghbors. The smple Isng problem n one dmenson can be solved drectly n several
More informationResearch Article Green s Theorem for Sign Data
Internatonal Scholarly Research Network ISRN Appled Mathematcs Volume 2012, Artcle ID 539359, 10 pages do:10.5402/2012/539359 Research Artcle Green s Theorem for Sgn Data Lous M. Houston The Unversty of
More informationYong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 )
Kangweon-Kyungk Math. Jour. 4 1996), No. 1, pp. 7 16 AN ITERATIVE ROW-ACTION METHOD FOR MULTICOMMODITY TRANSPORTATION PROBLEMS Yong Joon Ryang Abstract. The optmzaton problems wth quadratc constrants often
More informationU.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016
U.C. Berkeley CS94: Spectral Methods and Expanders Handout 8 Luca Trevsan February 7, 06 Lecture 8: Spectral Algorthms Wrap-up In whch we talk about even more generalzatons of Cheeger s nequaltes, and
More informationCHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE
CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE Analytcal soluton s usually not possble when exctaton vares arbtrarly wth tme or f the system s nonlnear. Such problems can be solved by numercal tmesteppng
More informationLogistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton
More informationThe Second Anti-Mathima on Game Theory
The Second Ant-Mathma on Game Theory Ath. Kehagas December 1 2006 1 Introducton In ths note we wll examne the noton of game equlbrum for three types of games 1. 2-player 2-acton zero-sum games 2. 2-player
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 89 Fall 206 Introducton to Machne Learnng Fnal Do not open the exam before you are nstructed to do so The exam s closed book, closed notes except your one-page cheat sheet Usage of electronc devces
More informationComputation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models
Computaton of Hgher Order Moments from Two Multnomal Overdsperson Lkelhood Models BY J. T. NEWCOMER, N. K. NEERCHAL Department of Mathematcs and Statstcs, Unversty of Maryland, Baltmore County, Baltmore,
More information1 GSW Iterative Techniques for y = Ax
1 for y = A I m gong to cheat here. here are a lot of teratve technques that can be used to solve the general case of a set of smultaneous equatons (wrtten n the matr form as y = A), but ths chapter sn
More informationProbabilistic & Unsupervised Learning
Probablstc & Unsupervsed Learnng Convex Algorthms n Approxmate Inference Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computatonal Neuroscence Unt Unversty College London Term 1, Autumn 2008 Convexty A convex
More informationLecture 4: Universal Hash Functions/Streaming Cont d
CSE 5: Desgn and Analyss of Algorthms I Sprng 06 Lecture 4: Unversal Hash Functons/Streamng Cont d Lecturer: Shayan Oves Gharan Aprl 6th Scrbe: Jacob Schreber Dsclamer: These notes have not been subjected
More informationEnsemble Methods: Boosting
Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement
More informationprinceton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg
prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there
More informationELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM
ELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM An elastc wave s a deformaton of the body that travels throughout the body n all drectons. We can examne the deformaton over a perod of tme by fxng our look
More informationHidden Markov Models
CM229S: Machne Learnng for Bonformatcs Lecture 12-05/05/2016 Hdden Markov Models Lecturer: Srram Sankararaman Scrbe: Akshay Dattatray Shnde Edted by: TBD 1 Introducton For a drected graph G we can wrte
More information10-701/ Machine Learning, Fall 2005 Homework 3
10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40
More informationA new construction of 3-separable matrices via an improved decoding of Macula s construction
Dscrete Optmzaton 5 008 700 704 Contents lsts avalable at ScenceDrect Dscrete Optmzaton journal homepage: wwwelsevercom/locate/dsopt A new constructon of 3-separable matrces va an mproved decodng of Macula
More informationNON-CENTRAL 7-POINT FORMULA IN THE METHOD OF LINES FOR PARABOLIC AND BURGERS' EQUATIONS
IJRRAS 8 (3 September 011 www.arpapress.com/volumes/vol8issue3/ijrras_8_3_08.pdf NON-CENTRAL 7-POINT FORMULA IN THE METHOD OF LINES FOR PARABOLIC AND BURGERS' EQUATIONS H.O. Bakodah Dept. of Mathematc
More informationMaximizing the number of nonnegative subsets
Maxmzng the number of nonnegatve subsets Noga Alon Hao Huang December 1, 213 Abstract Gven a set of n real numbers, f the sum of elements of every subset of sze larger than k s negatve, what s the maxmum
More informationSecond order approximations for probability models
Second order approxmatons for probablty models lbert Kappen Department of Bophyscs Njmegen Unversty Njmegen, the Netherlands bertmbfysunnl Wm Wegernc Department of Bophyscs Njmegen Unversty Njmegen, the
More informationWhich Separator? Spring 1
Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal
More informationSTATS 306B: Unsupervised Learning Spring Lecture 10 April 30
STATS 306B: Unsupervsed Learnng Sprng 2014 Lecture 10 Aprl 30 Lecturer: Lester Mackey Scrbe: Joey Arthur, Rakesh Achanta 10.1 Factor Analyss 10.1.1 Recap Recall the factor analyss (FA) model for lnear
More informationMore metrics on cartesian products
More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of
More informationIntegrals and Invariants of Euler-Lagrange Equations
Lecture 16 Integrals and Invarants of Euler-Lagrange Equatons ME 256 at the Indan Insttute of Scence, Bengaluru Varatonal Methods and Structural Optmzaton G. K. Ananthasuresh Professor, Mechancal Engneerng,
More informationC/CS/Phy191 Problem Set 3 Solutions Out: Oct 1, 2008., where ( 00. ), so the overall state of the system is ) ( ( ( ( 00 ± 11 ), Φ ± = 1
C/CS/Phy9 Problem Set 3 Solutons Out: Oct, 8 Suppose you have two qubts n some arbtrary entangled state ψ You apply the teleportaton protocol to each of the qubts separately What s the resultng state obtaned
More information