Inference in Multilayer Networks via. Michael Kearns and Lawrence Saul. AT&T Labs Research. Shannon Laboratory. 180 Park Avenue A-235
|
|
- Stuart Nicholson
- 6 years ago
- Views:
Transcription
1 Inference n Multlayer etworks va arge Devaton Bounds Mchael Kearns and awrence Saul AT&T abs Research Shannon aboratory 180 Park Avenue A-235 Florham Park, J fmkearns,lsaulg@research.att.com Abstract We study probablstc nference n large, layered Bayesan networks represented as drected acyclc graphs. We show that the ntractablty of exact nference n such networks does not preclude ther eectve use. We gve algorthms for approxmate probablstc nference that explot averagng phenomena occurrng at nodes wth large numbers of parents. We show that these algorthms compute rgorous lower and upper bounds on margnal probabltes of nterest, prove that these bounds become exact n the lmt of large networks, and provde rates of convergence. 1 Introducton The promse of neural computaton les n explotng the nformaton processng abltes of smple computng elements organzed nto large networks. Arguably one of the most mportant types of nformaton processng s the capacty for probablstc reasonng. The propertes of undrected probablstc models represented as symmetrc networks have been studed extensvely usng methods from statstcal mechancs (Hertz et al, 1991). Detaled analyses of these models are possble by explotng averagng phenomena that occur n the thermodynamc lmt of large networks. In ths paper, we analyze the lmt of large, multlayer networks for probablstc models represented as drected acyclc graphs. These models are known as Bayesan networks (Pearl, 1988; eal, 1992), and they have derent probablstc semantcs than symmetrc neural networks (such as Hopeld models or Boltzmann machnes). We show that the ntractablty of exact nference n multlayer Bayesan networks
2 does not preclude ther eectve use. Our work bulds on earler studes of varatonal methods (Jordan et al, 1997). We gve algorthms for approxmate probablstc nference that explot averagng phenomena occurrng at nodes wth 1 parents. We show that these algorthms compute rgorous lower and upper bounds on margnal probabltes of nterest, prove that these bounds become exact n the lmt!1, and provde rates of convergence. 2 Dentons and Prelmnares ABayesan network s a drected graphcal probablstc model, n whch the nodes represent random varables, and the lnks represent causal dependences. The ont dstrbuton of ths model s obtaned by composng the local condtonal probablty dstrbutons (or tables), Pr[chldparents], speced at each node n the network. For networks of bnary random varables, so-called transfer functons provde a convenent way to parameterze condtonal probablty tables (CPTs). A transfer functon s a mappng f :[,1; 1]! [0; 1] that s everywhere derentable and satses f 0 (x) 0 for all x (thus, f s nondecreasng). If f 0 (x) for all x, wesay that f has slope. Common examples of transfer functons of bounded R slope nclude the sgmod f(x) =1=(1 + e,x x ), the cumulatve gaussan f(x) = p,1 dt e,t2 =, and the nosy-or f(x) = 1, e,x. Because the value of a transfer functon f s bounded between 0 and 1, t can be nterpreted as the condtonal probablty that a bnary random varable takes on a partcular value. One use of transfer functons s to endow multlayer networks of soft-thresholdng computng elements wth probablstc semantcs. Ths motvates the followng denton: Denton 1 For a transfer functon f, a layered probablstc f-network has: odes representng bnary varables fxì g, ` =1;:::; and =1;:::;. Thus, s the number of layers, and each layer contans nodes. For every par of nodes X`,1 `,1 from X`,1 to Xì. and Xì n adacent layers, a real-valued weght For every node X 1 n the rst layer, a bas p. We wll sometmes refer to nodes n layer 1 as nputs, and to nodes n layer as outputs. A layered probablstc f-network denes a ont probablty dstrbuton over all of the varables fxì g as follows: each nput node X 1 s ndependently set to 1 wth probablty p, and to 0 wth probablty 1, p. Inductvely, gven bnary values X`,1 = x`,1 2f0; P 1g for all of the nodes n layer `, 1, the node Xì s set to 1 wth probablty f( x`,1 ). `,1 Among other uses, multlayer networks of ths form have been studed as herarchcal generatve models of sensory data (Hnton et al, 1995). In such applcatons, the fundamental computatonal problem (known as nference) s that of estmatng the margnal probablty of evdence at some number of output nodes, say the rst K. (The computaton of condtonal probabltes, such as dagnostc queres, can be reduced to margnals va Bayes rule.) More precsely, one wshes to estmate Pr[X1 = x 1 ;:::;XK = x K] (where x 2f0; 1g), a quantty whose exact computaton nvolves an exponental sum over all the possble settngs of the unnstantated nodes n layers 1 through, 1, and s known to be computatonally ntractable (Cooper, 1990).
3 3 arge Devaton and Unon Bounds One of our man weapons wll be the theory of large devatons. As a rst llustraton of ths theory, consder the nput nodes fx 1 g (whch P are ndependently set to 0 or 1 accordng to ther bases p ) and the weghted sum 1 X1 that feeds nto the th node X 2 n the second layer. A P typcal large devaton bound (Kearns & Saul, 1997) states that for all >0, Pr[ 1 (X 1, p ) >] 2e,22 =( 2 ) where s the largest weght n the network. If we make the scalng assumpton that each weght 1 s bounded by = for some constant (thus, =), then we see that the probablty of large (order 1) devatons of ths weghted sum from ts mean decays exponentally wth. (Our methods can also provde results under the weaker assumpton that all weghts are bounded by O(,a ) for a>1=2.) How can we apply ths observaton to the problem of nference? Suppose we are nterested n the margnal probablty Pr[X 2 = 1]. Then the large devaton bound tells us that wth probablty at least 1, (where we dene =2e P,22 = 2 ), the weghted sum at node X 2 wll be wthn of ts mean value = 1 p.thus, wth probablty at least 1,, we are assured that Pr[X 2 = 1] s at least f(, ) and at most f( + ). Of course, the p sde of the large devaton bound s that wth probablty at most, the weghted sum may fall more than away from. In ths case we can make no guarantees on Pr[X 2 = 1] asde from the trval lower and upper bounds of 0 and 1. Combnng both eventualtes, however, we obtan the overall bounds: (1, )f(, ) Pr[X 2 =1] (1, )f( + )+: (1) Equaton (1) s based on a smple P two-pont approxmaton to the dstrbuton over the weghted sum of nputs, 1 X1. Ths approxmaton places one pont, wth weght 1,, at ether above orbelow the mean (dependng on whether we are dervng the upper or lower bound); and the other pont, wth weght, at ether,1 or +1. The value of depends on the choce of : n partcular, as becomes smaller, we gve more weght to the 1 pont, wth the trade-o governed by the large devaton bound. We regard the weght gven to the 1 pont asa throw-away probablty, snce wth ths weght we resort to the trval bounds of 0 or 1 on the margnal probablty Pr[X 2 = 1]. ote that the very smple bounds n Equaton (1) already exhbt an nterestng trade-o, governed by the choce of the parameter namely, as becomes smaller, the throw-away probablty becomes larger, whle the terms f( ) converge to the same value. Snce the overall bounds nvolve products of f( ) and 1,, the optmal value of s the one that balances ths competton between probable explanatons of the evdence and mprobable devatons from the mean. Ths tradeo s remnscent of that encountered between energy and entropy n mean-eld approxmatons for symmetrc networks (Hertz et al, 1991). So far we have consdered the margnal probablty nvolvng a sngle node n the second layer. We can also compute bounds on the margnal probabltes nvolvng K>1nodes n ths layer (whch wthout loss of generalty we take to be the nodes X1 2 through XK 2 ). Ths s done by consderng the probablty that one or more of the weghted sums enterng these K nodes n the second layer devate by more than from ther means. We can upper bound ths probablty by K by appealng to the so-called unon bound, whch smply states that the probablty of a unon of events s bounded by the sum of ther ndvdual probabltes. The unon bound allows us to bound margnal probabltes nvolvng multple varables. For example,
4 consder the margnal probablty Pr[X1 2 =1;:::;XK 2 = 1]. Combnng the large devaton and unon bounds, we nd: (1,K) KY =1 f(,) Pr[X 2 1 =1;:::;X 2 K =1] (1,K) KY =1 f( +)+K: (2) Anumber of observatons are n order here. Frst, Equaton (2) drectly leads to ecent algorthms for computng the upper and lower bounds. Second, although for smplcty wehave consdered {devatons of the same sze at each node n the second layer, the same methods apply to derent choces of (and therefore ) at each node. Indeed, varatons n can lead to sgncantly tghter bounds, and thus we explot the freedom to choose derent n the rest of the paper. Ths results, for example, n bounds of the form: 1,! KX KY f(, ) Pr[X 2 1 =1;:::;XK 2 =1]; where =2e,22 = 2 : =1 =1 (3) The reader s nvted to study the small but mportant derences between ths lower bound and the one n Equaton (2). Thrd, the arguments leadng to bounds on the margnal probablty Pr[X1 2 =1;:::;XK 2 = 1] generalze n a straghtforward manner to other patterns of evdence besdes all 1's. For nstance, agan ust consderng the lower bound, we have:! KX Y Y 1, [1,f( + )] f(, ) Pr[X 2 1 = x 1;:::;XK 2 = x K] (4) =1 x =0 x =1 where x 2f0; 1g are arbtrary bnary values. Thus together the large devaton and unon bounds provde the means to compute upper and lower bounds on the margnal probabltes over nodes n the second layer. Further detals and consequences of these bounds for the specal case of two-layer networks are gven n a companon paper (Kearns & Saul, 1997); our nterest here, however, s n the more challengng generalzaton to multlayer networks. 4 Multlayer etworks: Inference va Inducton In extendng the deas of the prevous secton to multlayer networks, we face the problem that the nodes n the second layer, unlke those n the rst, are not ndependent. But we can stll adopt an nductve strategy to derve bounds on margnal probabltes. The crucal observaton s that condtoned on the values of the ncomng weghted sums at the nodes n the second layer, the varables fx 2 g do become ndependent. More generally, condtoned on these weghted sums all fallng \near" ther means an event whose probablty we quanted n the last secton the nodes fx 2 g become \almost" ndependent. It s exactly ths near-ndependence that we now formalze and explot nductvely to compute bounds for multlayer networks. The rst tool we requre s an approprate generalzaton of the large devaton bound, whch does not rely on precse knowledge of the means of the random varables beng summed. Theorem 1 For all 1, let X 2f0; 1g denote ndependent bnary random varables, and let. Suppose that the means are bounded bye[x ],p, P where 0 < p 1,. Then for all > 1 : Pr 2 4 X 1 (X, p ) > 5, 2 2e 3 2,, 1 P 2 : (5)
5 The proof of ths result s omtted due to space consderatons. ow for nducton, consder the nodes n the `th P layer of the network. Suppose we are told that for every, the weghted sum `,1 X`,1 enterng nto the node Xì les n the nterval [ì, ì;ì + ì], for some choce of the ì and the ì. Then the mean of node Xì s constraned to le n the nterval [pì, ì;pì +ì], where pì = 1 2 f(ì, ì)+f(ì + ì) (6) f(ì + ì), f(ì, ì) : (7) ì = 1 2 Here we have smply run the leftmost and rghtmost allowed values for the ncomng weghted sums through the transfer functon, and dened the nterval around the mean of unt Xì to be centered around pì. Thus we have translated uncertantes on the ncomng weghted sums to layer ` nto condtonal uncertantes on the means of the nodes Xì n layer `. To complete the cycle, we now translate these nto condtonal uncertantes on the ncomng weghted sums to layer ` + 1. In partcular, condtoned P on the orgnal ntervals [ì, ì;ì + ì], what s probablty that for each, ìx` les nsde some new nterval [, ; + ]? In order to make some guarantee on ths probablty, we set P = ìp` ì`. These condtons suce to ensure that and assume that the new ntervals contan the (condtonal) expected values of the weghted sums P > P ìx`, and that the new ntervals are large enough to encompass the ncomng uncertantes. Because these condtons are a mnmal requrement for establshng any probablstc guarantees, we shall say that the [ì, ì;ì + ì] dene a vald set of -ntervals f they meet these condtons for all 1. Gven a vald set of -ntervals at the (` + 1)th layer, t follows from Theorem 1 and the unon bound that the weghted sums enterng nodes n layer ` + 1 obey where Pr 2 4 X ` X`, > =2e, 2 2, for some 1, P 3 X 5 =1 (8) ì ` 2 : (9) In what follows, we shall frequently make use of the fact that the weghted sums P ìxì are bounded by ntervals [, ; + ]. Ths motvates the followng dentons. Denton 2 Gven a vald set of -ntervals and bnary values fxì = xìg for the nodes n the `th layer, we say that the (` +1)st layer of the network satses ts -ntervals f P ìx`, < for all 1. Otherwse, we say that the (` +1)st layer volates ts -ntervals. Suppose that we are gven a vald set of -ntervals and that we sample from the ont dstrbuton dened by the probablstc f-network. The rght hand sde of Equaton (8) provdes an upper bound on the condtonal probablty that the (` + 1)st layer volates ts -ntervals, gven that the `th layer dd not. Ths upper bound may be vacuous (that s, larger than 1), so let us denote by whchever s smaller the rght hand sde of Equaton (8), or 1; n other words, = mnnp o =1 ; 1. Snce at the `th layer, the probablty of volatng the -ntervals s at most ` we
6 are guaranteed that wth probablty at least Q`>1 [1, `], all the layers satsfy ther -ntervals. Conversely, we are guaranteed that the probablty that any layer volates ts -ntervals s at most 1, Q`>1 [1, `]. Treatng ths as a throw-away probablty,we can now compute upper and lower bounds on margnal probabltes nvolvng nodes at the th layer exactly as n the case of nodes at the second layer. Ths yelds the followng theorem. Theorem 2 For any subset fx1 ;:::;X Kg of the outputs of a probablstc f- network, for any settng x 1 ;:::;x K, and for any vald set of -ntervals, the margnal probablty of partal evdence n the output layer obeys: Y Y 1, ` Y f(, ) 1, f( + ) (10) `>1 x =1 x =0 Pr[X = 1 x 1;:::;XK = x K] Ỳ Y 1, ` Y f( + ) >1 x =1 x =0 1, f(, ) + 1, Ỳ >1 1, `! (11) Theorem 2 generalzes our earler results for margnal probabltes over nodes n the second layer; for example, compare Equaton (10) to Equaton (4). Agan, the upper and lower bounds can be ecently computed for all common transfer functons. 5 Rates of Convergence To demonstrate the power of Theorem 2, we consder how the gap (or addtve derence) between these upper and lower bounds on Pr[X1 = x 1 ;:::;XK = x K] behaves for some crude (but nformed) choces of the fìg. Our goal s to derve the rate at whch these upper and lower bounds converge to the same value as we examne larger and larger networks. Suppose we choose the -ntervals nductvely by denng 1 = 0 and settng = X ì ` + r 2 ln for some >1. From Equatons (8) and (9), ths choce gves 2 1,2 as an upper bound on the probablty that the (` + 1)th layer volates ts -ntervals. Moreover, denotng the gap between the upper and lower bounds n Theorem 2 by G, t can be shown that: G 2 r 2 ln 1, () 1, KX =1 Y v =1 6= f( + ) Y v =0 6= (12) 1, f(, ) + 2 2,1 : et us brey recall the dentons of the parameters on the rght hand sde of ths equaton: s the maxmal slope of the transfer functon f, s the number of nodes n each layer, K s the number of nodes wth evdence, = s tmes the largest weght n the network, s the number of layers, and >1 s a parameter at our dsposal. The rst term of ths bound essentally has a 1=p dependence on, but s multpled by a dampng factor that we mght typcally expect to decay exponentally wth the number K of outputs examned. To see ths, smply notce that each of the factors f( + ) and [1,f(, )] s bounded by 1; furthermore, (13)
7 snce all the means are bounded, f s large compared to then the are small, and each of these factors s n fact bounded by some value <1. Thus the rst term n Equaton (13) s bounded by a constant tmes K,1 K p ln()=. Snce t s natural to expect the margnal probablty of nterest tself to decrease exponentally wth K, ths s desrable and natural behavor. Of course, n the case of large K, the behavor of the resultng overall bound can be domnated by the second term 2= 2,1 of Equaton (13). In such stuatons, however, we can consder larger values of, possbly even of order K; ndeed, for sucently large, the rst term (whch scales lke p )must necessarly overtake the second one. Thus there s a clear trade-o between the two terms, as well as optmal value of that sets them to be (roughly) the same magntude. Generally speakng, for xed K and large, we observe that the derence between our upper p and lower bounds on Pr[X1 = x 1 ;:::;XK = x K]vanshes as O ln()=. 6 An Algorthm for Fxed Multlayer etworks We conclude by notng that the specc choces made for the parameters n Secton 5 to derve rates of convergence may be far from the optmal choces for a xed network of nterest. However, Theorem 2 drectly suggests a natural algorthm for approxmate probablstc nference. In partcular, regardng the upper and lower bounds on Pr[X 1 = x 1;:::;XK = x K] as functons of fìg, we can optmze these bounds by standard numercal methods. For the upper bound, we may perform gradent descent n the fìg to nd a local mnmum, whle for the lower bound, we may perform gradent ascent to nd a local maxmum. The components of these gradents n both cases are easly computable for all the commonly studed transfer functons. Moreover, the constrant of mantanng vald -ntervals can be enforced by mantanng a oor on the -ntervals n one layer n terms of those at the prevous one. The practcal applcaton of ths algorthm to nterestng Bayesan networks wll be studed n future work. References Cooper, G. (1990). Computatonal complexty of probablstc nference usng Bayesan belef networks. Artcal Intellgence 42: Hertz, J,. Krogh, A., & Palmer, R. (1991). Introducton to the theory of neural computaton. Addson-Wesley, Redwood Cty, CA. Hnton, G., Dayan, P., Frey, B., and eal, R. (1995). The wake-sleep algorthm for unsupervsed neural networks. Scence 268:1158{1161. Jordan, M., Ghahraman, Z., Jaakkola, T., & Saul,. (1997). An ntroducton to varatonal methods for graphcal models. In M. Jordan, ed. earnng n Graphcal Models. Kluwer Academc. Kearns, M., & Saul,. (1998). arge devaton methods for approxmate probablstc nference. In Proceedngs of the 14th Annual Conference on Uncertanty n Artcal Intellgence. eal, R. (1992). Connectonst learnng of belef networks. Artcal Intellgence 56:71{113. Pearl, J. (1988). Probablstc Reasonng n Intellgent Systems: etworks of Plausble Inference. Morgan Kaufmann, San Mateo, CA.
For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.
Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.
More informationFeature Selection: Part 1
CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?
More informationCollege of Computer & Information Science Fall 2009 Northeastern University 20 October 2009
College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:
More informationVapnik-Chervonenkis theory
Vapnk-Chervonenks theory Rs Kondor June 13, 2008 For the purposes of ths lecture, we restrct ourselves to the bnary supervsed batch learnng settng. We assume that we have an nput space X, and an unknown
More informationProblem Set 9 Solutions
Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem
More informationChapter 11: Simple Linear Regression and Correlation
Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationLecture 4. Instructor: Haipeng Luo
Lecture 4 Instructor: Hapeng Luo In the followng lectures, we focus on the expert problem and study more adaptve algorthms. Although Hedge s proven to be worst-case optmal, one may wonder how well t would
More informationLecture 10 Support Vector Machines II
Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed
More informationIntroduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:
CE304, Sprng 2004 Lecture 4 Introducton to Vapor/Lqud Equlbrum, part 2 Raoult s Law: The smplest model that allows us do VLE calculatons s obtaned when we assume that the vapor phase s an deal gas, and
More informationDepartment of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING
MACHINE LEANING Vasant Honavar Bonformatcs and Computatonal Bology rogram Center for Computatonal Intellgence, Learnng, & Dscovery Iowa State Unversty honavar@cs.astate.edu www.cs.astate.edu/~honavar/
More information3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X
Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number
More informationModule 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:
More informationOutline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline
Outlne Bayesan Networks: Maxmum Lkelhood Estmaton and Tree Structure Learnng Huzhen Yu janey.yu@cs.helsnk.f Dept. Computer Scence, Unv. of Helsnk Probablstc Models, Sprng, 200 Notces: I corrected a number
More informationEcon107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)
I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes
More informationHopfield Training Rules 1 N
Hopfeld Tranng Rules To memorse a sngle pattern Suppose e set the eghts thus - = p p here, s the eght beteen nodes & s the number of nodes n the netor p s the value requred for the -th node What ll the
More informationComputational Biology Lecture 8: Substitution matrices Saad Mneimneh
Computatonal Bology Lecture 8: Substtuton matrces Saad Mnemneh As we have ntroduced last tme, smple scorng schemes lke + or a match, - or a msmatch and -2 or a gap are not justable bologcally, especally
More informationLecture 12: Discrete Laplacian
Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly
More informationArtificial Intelligence Bayesian Networks
Artfcal Intellgence Bayesan Networks Adapted from sldes by Tm Fnn and Mare desjardns. Some materal borrowed from Lse Getoor. 1 Outlne Bayesan networks Network structure Condtonal probablty tables Condtonal
More informationFinding Dense Subgraphs in G(n, 1/2)
Fndng Dense Subgraphs n Gn, 1/ Atsh Das Sarma 1, Amt Deshpande, and Rav Kannan 1 Georga Insttute of Technology,atsh@cc.gatech.edu Mcrosoft Research-Bangalore,amtdesh,annan@mcrosoft.com Abstract. Fndng
More informationNUMERICAL DIFFERENTIATION
NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the
More informationHomework Assignment 3 Due in class, Thursday October 15
Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.
More information10-701/ Machine Learning, Fall 2005 Homework 3
10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More informationMarkov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement
Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs
More informationA Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach
A Bayes Algorthm for the Multtask Pattern Recognton Problem Drect Approach Edward Puchala Wroclaw Unversty of Technology, Char of Systems and Computer etworks, Wybrzeze Wyspanskego 7, 50-370 Wroclaw, Poland
More informationMaximum Likelihood Estimation (MLE)
Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y
More informationLinear Approximation with Regularization and Moving Least Squares
Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...
More informationLecture 4 Hypothesis Testing
Lecture 4 Hypothess Testng We may wsh to test pror hypotheses about the coeffcents we estmate. We can use the estmates to test whether the data rejects our hypothess. An example mght be that we wsh to
More informationExpectation propagation
Expectaton propagaton Lloyd Ellott May 17, 2011 Suppose p(x) s a pdf and we have a factorzaton p(x) = 1 Z n f (x). (1) =1 Expectaton propagaton s an nference algorthm desgned to approxmate the factors
More informationThe Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction
ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also
More informationCase A. P k = Ni ( 2L i k 1 ) + (# big cells) 10d 2 P k.
THE CELLULAR METHOD In ths lecture, we ntroduce the cellular method as an approach to ncdence geometry theorems lke the Szemeréd-Trotter theorem. The method was ntroduced n the paper Combnatoral complexty
More informationWeek 5: Neural Networks
Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple
More informationAssortment Optimization under MNL
Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.
More informationComputing MLE Bias Empirically
Computng MLE Bas Emprcally Kar Wa Lm Australan atonal Unversty January 3, 27 Abstract Ths note studes the bas arses from the MLE estmate of the rate parameter and the mean parameter of an exponental dstrbuton.
More informationStatistical Inference. 2.3 Summary Statistics Measures of Center and Spread. parameters ( population characteristics )
Ismor Fscher, 8//008 Stat 54 / -8.3 Summary Statstcs Measures of Center and Spread Dstrbuton of dscrete contnuous POPULATION Random Varable, numercal True center =??? True spread =???? parameters ( populaton
More informationLogistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton
More informationNotes on Frequency Estimation in Data Streams
Notes on Frequency Estmaton n Data Streams In (one of) the data streamng model(s), the data s a sequence of arrvals a 1, a 2,..., a m of the form a j = (, v) where s the dentty of the tem and belongs to
More informationConvergence of random processes
DS-GA 12 Lecture notes 6 Fall 216 Convergence of random processes 1 Introducton In these notes we study convergence of dscrete random processes. Ths allows to characterze phenomena such as the law of large
More informationRandomness and Computation
Randomness and Computaton or, Randomzed Algorthms Mary Cryan School of Informatcs Unversty of Ednburgh RC 208/9) Lecture 0 slde Balls n Bns m balls, n bns, and balls thrown unformly at random nto bns usually
More informationBayesian Networks. Course: CS40022 Instructor: Dr. Pallab Dasgupta
Bayesan Networks Course: CS40022 Instructor: Dr. Pallab Dasgupta Department of Computer Scence & Engneerng Indan Insttute of Technology Kharagpur Example Burglar alarm at home Farly relable at detectng
More informationStatistics for Economics & Business
Statstcs for Economcs & Busness Smple Lnear Regresson Learnng Objectves In ths chapter, you learn: How to use regresson analyss to predct the value of a dependent varable based on an ndependent varable
More informationErrors for Linear Systems
Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch
More informationGlobal Sensitivity. Tuesday 20 th February, 2018
Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values
More informationCS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016
CS 29-128: Algorthms and Uncertanty Lecture 17 Date: October 26, 2016 Instructor: Nkhl Bansal Scrbe: Mchael Denns 1 Introducton In ths lecture we wll be lookng nto the secretary problem, and an nterestng
More informationThe Expectation-Maximization Algorithm
The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.
More informationLecture 17 : Stochastic Processes II
: Stochastc Processes II 1 Contnuous-tme stochastc process So far we have studed dscrete-tme stochastc processes. We studed the concept of Makov chans and martngales, tme seres analyss, and regresson analyss
More informationMATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)
1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More informationAPPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14
APPROXIMAE PRICES OF BASKE AND ASIAN OPIONS DUPON OLIVIER Prema 14 Contents Introducton 1 1. Framewor 1 1.1. Baset optons 1.. Asan optons. Computng the prce 3. Lower bound 3.1. Closed formula for the prce
More information18.1 Introduction and Recap
CS787: Advanced Algorthms Scrbe: Pryananda Shenoy and Shjn Kong Lecturer: Shuch Chawla Topc: Streamng Algorthmscontnued) Date: 0/26/2007 We contnue talng about streamng algorthms n ths lecture, ncludng
More informationHidden Markov Models
CM229S: Machne Learnng for Bonformatcs Lecture 12-05/05/2016 Hdden Markov Models Lecturer: Srram Sankararaman Scrbe: Akshay Dattatray Shnde Edted by: TBD 1 Introducton For a drected graph G we can wrte
More information20. Mon, Oct. 13 What we have done so far corresponds roughly to Chapters 2 & 3 of Lee. Now we turn to Chapter 4. The first idea is connectedness.
20. Mon, Oct. 13 What we have done so far corresponds roughly to Chapters 2 & 3 of Lee. Now we turn to Chapter 4. The frst dea s connectedness. Essentally, we want to say that a space cannot be decomposed
More informationNumerical Solution of Ordinary Differential Equations
Numercal Methods (CENG 00) CHAPTER-VI Numercal Soluton of Ordnar Dfferental Equatons 6 Introducton Dfferental equatons are equatons composed of an unknown functon and ts dervatves The followng are examples
More informationCSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography
CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve
More informationMore metrics on cartesian products
More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of
More informationLecture 20: November 7
0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:
More informationChapter 9: Statistical Inference and the Relationship between Two Variables
Chapter 9: Statstcal Inference and the Relatonshp between Two Varables Key Words The Regresson Model The Sample Regresson Equaton The Pearson Correlaton Coeffcent Learnng Outcomes After studyng ths chapter,
More informationLecture 4: November 17, Part 1 Single Buffer Management
Lecturer: Ad Rosén Algorthms for the anagement of Networs Fall 2003-2004 Lecture 4: November 7, 2003 Scrbe: Guy Grebla Part Sngle Buffer anagement In the prevous lecture we taled about the Combned Input
More informationLinear Regression Analysis: Terminology and Notation
ECON 35* -- Secton : Basc Concepts of Regresson Analyss (Page ) Lnear Regresson Analyss: Termnology and Notaton Consder the generc verson of the smple (two-varable) lnear regresson model. It s represented
More informationSociété de Calcul Mathématique SA
Socété de Calcul Mathématque SA Outls d'ade à la décson Tools for decson help Probablstc Studes: Normalzng the Hstograms Bernard Beauzamy December, 202 I. General constructon of the hstogram Any probablstc
More informationSnce h( q^; q) = hq ~ and h( p^ ; p) = hp, one can wrte ~ h hq hp = hq ~hp ~ (7) the uncertanty relaton for an arbtrary state. The states that mnmze t
8.5: Many-body phenomena n condensed matter and atomc physcs Last moded: September, 003 Lecture. Squeezed States In ths lecture we shall contnue the dscusson of coherent states, focusng on ther propertes
More informationU-Pb Geochronology Practical: Background
U-Pb Geochronology Practcal: Background Basc Concepts: accuracy: measure of the dfference between an expermental measurement and the true value precson: measure of the reproducblty of the expermental result
More information3.1 ML and Empirical Distribution
67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum
More informationConjugacy and the Exponential Family
CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the
More information4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA
4 Analyss of Varance (ANOVA) 5 ANOVA 51 Introducton ANOVA ANOVA s a way to estmate and test the means of multple populatons We wll start wth one-way ANOVA If the populatons ncluded n the study are selected
More informationHidden Markov Models & The Multivariate Gaussian (10/26/04)
CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models
More informationSupplementary Notes for Chapter 9 Mixture Thermodynamics
Supplementary Notes for Chapter 9 Mxture Thermodynamcs Key ponts Nne major topcs of Chapter 9 are revewed below: 1. Notaton and operatonal equatons for mxtures 2. PVTN EOSs for mxtures 3. General effects
More informationPsychology 282 Lecture #24 Outline Regression Diagnostics: Outliers
Psychology 282 Lecture #24 Outlne Regresson Dagnostcs: Outlers In an earler lecture we studed the statstcal assumptons underlyng the regresson model, ncludng the followng ponts: Formal statement of assumptons.
More informationMIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU
Group M D L M Chapter Bayesan Decson heory Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty Bayesan Decson heory Bayesan decson theory s a statstcal approach to data mnng/pattern
More informationEM and Structure Learning
EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder
More informationExpected Value and Variance
MATH 38 Expected Value and Varance Dr. Neal, WKU We now shall dscuss how to fnd the average and standard devaton of a random varable X. Expected Value Defnton. The expected value (or average value, or
More informationCHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE
CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE Analytcal soluton s usually not possble when exctaton vares arbtrarly wth tme or f the system s nonlnear. Such problems can be solved by numercal tmesteppng
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 89 Fall 206 Introducton to Machne Learnng Fnal Do not open the exam before you are nstructed to do so The exam s closed book, closed notes except your one-page cheat sheet Usage of electronc devces
More informationSolution Thermodynamics
Soluton hermodynamcs usng Wagner Notaton by Stanley. Howard Department of aterals and etallurgcal Engneerng South Dakota School of nes and echnology Rapd Cty, SD 57701 January 7, 001 Soluton hermodynamcs
More informationFoundations of Arithmetic
Foundatons of Arthmetc Notaton We shall denote the sum and product of numbers n the usual notaton as a 2 + a 2 + a 3 + + a = a, a 1 a 2 a 3 a = a The notaton a b means a dvdes b,.e. ac = b where c s an
More informationComparison of Regression Lines
STATGRAPHICS Rev. 9/13/2013 Comparson of Regresson Lnes Summary... 1 Data Input... 3 Analyss Summary... 4 Plot of Ftted Model... 6 Condtonal Sums of Squares... 6 Analyss Optons... 7 Forecasts... 8 Confdence
More informationMaximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models
ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models
More informationFREQUENCY DISTRIBUTIONS Page 1 of The idea of a frequency distribution for sets of observations will be introduced,
FREQUENCY DISTRIBUTIONS Page 1 of 6 I. Introducton 1. The dea of a frequency dstrbuton for sets of observatons wll be ntroduced, together wth some of the mechancs for constructng dstrbutons of data. Then
More informationTHE SUMMATION NOTATION Ʃ
Sngle Subscrpt otaton THE SUMMATIO OTATIO Ʃ Most of the calculatons we perform n statstcs are repettve operatons on lsts of numbers. For example, we compute the sum of a set of numbers, or the sum of the
More informationWhy BP Works STAT 232B
Why BP Works STAT 232B Free Energes Helmholz & Gbbs Free Energes 1 Dstance between Probablstc Models - K-L dvergence b{ KL b{ p{ = b{ ln { } p{ Here, p{ s the eact ont prob. b{ s the appromaton, called
More informationLecture 4: September 12
36-755: Advanced Statstcal Theory Fall 016 Lecture 4: September 1 Lecturer: Alessandro Rnaldo Scrbe: Xao Hu Ta Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer: These notes have not been
More informationComplete subgraphs in multipartite graphs
Complete subgraphs n multpartte graphs FLORIAN PFENDER Unverstät Rostock, Insttut für Mathematk D-18057 Rostock, Germany Floran.Pfender@un-rostock.de Abstract Turán s Theorem states that every graph G
More informationLimited Dependent Variables
Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages
More information1 Convex Optimization
Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,
More informationEconomics 101. Lecture 4 - Equilibrium and Efficiency
Economcs 0 Lecture 4 - Equlbrum and Effcency Intro As dscussed n the prevous lecture, we wll now move from an envronment where we looed at consumers mang decsons n solaton to analyzng economes full of
More informationLossy Compression. Compromise accuracy of reconstruction for increased compression.
Lossy Compresson Compromse accuracy of reconstructon for ncreased compresson. The reconstructon s usually vsbly ndstngushable from the orgnal mage. Typcally, one can get up to 0:1 compresson wth almost
More information2 STATISTICALLY OPTIMAL TRAINING DATA 2.1 A CRITERION OF OPTIMALITY We revew the crteron of statstcally optmal tranng data (Fukumzu et al., 1994). We
Advances n Neural Informaton Processng Systems 8 Actve Learnng n Multlayer Perceptrons Kenj Fukumzu Informaton and Communcaton R&D Center, Rcoh Co., Ltd. 3-2-3, Shn-yokohama, Yokohama, 222 Japan E-mal:
More informationprinceton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg
prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there
More informationTAIL BOUNDS FOR SUMS OF GEOMETRIC AND EXPONENTIAL VARIABLES
TAIL BOUNDS FOR SUMS OF GEOMETRIC AND EXPONENTIAL VARIABLES SVANTE JANSON Abstract. We gve explct bounds for the tal probabltes for sums of ndependent geometrc or exponental varables, possbly wth dfferent
More informationThe optimal delay of the second test is therefore approximately 210 hours earlier than =2.
THE IEC 61508 FORMULAS 223 The optmal delay of the second test s therefore approxmately 210 hours earler than =2. 8.4 The IEC 61508 Formulas IEC 61508-6 provdes approxmaton formulas for the PF for smple
More informationLearning Theory: Lecture Notes
Learnng Theory: Lecture Notes Lecturer: Kamalka Chaudhur Scrbe: Qush Wang October 27, 2012 1 The Agnostc PAC Model Recall that one of the constrants of the PAC model s that the data dstrbuton has to be
More informationSome modelling aspects for the Matlab implementation of MMA
Some modellng aspects for the Matlab mplementaton of MMA Krster Svanberg krlle@math.kth.se Optmzaton and Systems Theory Department of Mathematcs KTH, SE 10044 Stockholm September 2004 1. Consdered optmzaton
More informationSupporting Information
Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to
More informationDifference Equations
Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1
More informationClassification as a Regression Problem
Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class
More informationUncertainty and auto-correlation in. Measurement
Uncertanty and auto-correlaton n arxv:1707.03276v2 [physcs.data-an] 30 Dec 2017 Measurement Markus Schebl Federal Offce of Metrology and Surveyng (BEV), 1160 Venna, Austra E-mal: markus.schebl@bev.gv.at
More informationChapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.
Chapter - The Smple Lnear Regresson Model The lnear regresson equaton s: where y + = β + β e for =,..., y and are observable varables e s a random error How can an estmaton rule be constructed for the
More informationSolving Nonlinear Differential Equations by a Neural Network Method
Solvng Nonlnear Dfferental Equatons by a Neural Network Method Luce P. Aarts and Peter Van der Veer Delft Unversty of Technology, Faculty of Cvlengneerng and Geoscences, Secton of Cvlengneerng Informatcs,
More information