Inference in Multilayer Networks via. Michael Kearns and Lawrence Saul. AT&T Labs Research. Shannon Laboratory. 180 Park Avenue A-235

Size: px
Start display at page:

Download "Inference in Multilayer Networks via. Michael Kearns and Lawrence Saul. AT&T Labs Research. Shannon Laboratory. 180 Park Avenue A-235"

Transcription

1 Inference n Multlayer etworks va arge Devaton Bounds Mchael Kearns and awrence Saul AT&T abs Research Shannon aboratory 180 Park Avenue A-235 Florham Park, J fmkearns,lsaulg@research.att.com Abstract We study probablstc nference n large, layered Bayesan networks represented as drected acyclc graphs. We show that the ntractablty of exact nference n such networks does not preclude ther eectve use. We gve algorthms for approxmate probablstc nference that explot averagng phenomena occurrng at nodes wth large numbers of parents. We show that these algorthms compute rgorous lower and upper bounds on margnal probabltes of nterest, prove that these bounds become exact n the lmt of large networks, and provde rates of convergence. 1 Introducton The promse of neural computaton les n explotng the nformaton processng abltes of smple computng elements organzed nto large networks. Arguably one of the most mportant types of nformaton processng s the capacty for probablstc reasonng. The propertes of undrected probablstc models represented as symmetrc networks have been studed extensvely usng methods from statstcal mechancs (Hertz et al, 1991). Detaled analyses of these models are possble by explotng averagng phenomena that occur n the thermodynamc lmt of large networks. In ths paper, we analyze the lmt of large, multlayer networks for probablstc models represented as drected acyclc graphs. These models are known as Bayesan networks (Pearl, 1988; eal, 1992), and they have derent probablstc semantcs than symmetrc neural networks (such as Hopeld models or Boltzmann machnes). We show that the ntractablty of exact nference n multlayer Bayesan networks

2 does not preclude ther eectve use. Our work bulds on earler studes of varatonal methods (Jordan et al, 1997). We gve algorthms for approxmate probablstc nference that explot averagng phenomena occurrng at nodes wth 1 parents. We show that these algorthms compute rgorous lower and upper bounds on margnal probabltes of nterest, prove that these bounds become exact n the lmt!1, and provde rates of convergence. 2 Dentons and Prelmnares ABayesan network s a drected graphcal probablstc model, n whch the nodes represent random varables, and the lnks represent causal dependences. The ont dstrbuton of ths model s obtaned by composng the local condtonal probablty dstrbutons (or tables), Pr[chldparents], speced at each node n the network. For networks of bnary random varables, so-called transfer functons provde a convenent way to parameterze condtonal probablty tables (CPTs). A transfer functon s a mappng f :[,1; 1]! [0; 1] that s everywhere derentable and satses f 0 (x) 0 for all x (thus, f s nondecreasng). If f 0 (x) for all x, wesay that f has slope. Common examples of transfer functons of bounded R slope nclude the sgmod f(x) =1=(1 + e,x x ), the cumulatve gaussan f(x) = p,1 dt e,t2 =, and the nosy-or f(x) = 1, e,x. Because the value of a transfer functon f s bounded between 0 and 1, t can be nterpreted as the condtonal probablty that a bnary random varable takes on a partcular value. One use of transfer functons s to endow multlayer networks of soft-thresholdng computng elements wth probablstc semantcs. Ths motvates the followng denton: Denton 1 For a transfer functon f, a layered probablstc f-network has: odes representng bnary varables fxì g, ` =1;:::; and =1;:::;. Thus, s the number of layers, and each layer contans nodes. For every par of nodes X`,1 `,1 from X`,1 to Xì. and Xì n adacent layers, a real-valued weght For every node X 1 n the rst layer, a bas p. We wll sometmes refer to nodes n layer 1 as nputs, and to nodes n layer as outputs. A layered probablstc f-network denes a ont probablty dstrbuton over all of the varables fxì g as follows: each nput node X 1 s ndependently set to 1 wth probablty p, and to 0 wth probablty 1, p. Inductvely, gven bnary values X`,1 = x`,1 2f0; P 1g for all of the nodes n layer `, 1, the node Xì s set to 1 wth probablty f( x`,1 ). `,1 Among other uses, multlayer networks of ths form have been studed as herarchcal generatve models of sensory data (Hnton et al, 1995). In such applcatons, the fundamental computatonal problem (known as nference) s that of estmatng the margnal probablty of evdence at some number of output nodes, say the rst K. (The computaton of condtonal probabltes, such as dagnostc queres, can be reduced to margnals va Bayes rule.) More precsely, one wshes to estmate Pr[X1 = x 1 ;:::;XK = x K] (where x 2f0; 1g), a quantty whose exact computaton nvolves an exponental sum over all the possble settngs of the unnstantated nodes n layers 1 through, 1, and s known to be computatonally ntractable (Cooper, 1990).

3 3 arge Devaton and Unon Bounds One of our man weapons wll be the theory of large devatons. As a rst llustraton of ths theory, consder the nput nodes fx 1 g (whch P are ndependently set to 0 or 1 accordng to ther bases p ) and the weghted sum 1 X1 that feeds nto the th node X 2 n the second layer. A P typcal large devaton bound (Kearns & Saul, 1997) states that for all >0, Pr[ 1 (X 1, p ) >] 2e,22 =( 2 ) where s the largest weght n the network. If we make the scalng assumpton that each weght 1 s bounded by = for some constant (thus, =), then we see that the probablty of large (order 1) devatons of ths weghted sum from ts mean decays exponentally wth. (Our methods can also provde results under the weaker assumpton that all weghts are bounded by O(,a ) for a>1=2.) How can we apply ths observaton to the problem of nference? Suppose we are nterested n the margnal probablty Pr[X 2 = 1]. Then the large devaton bound tells us that wth probablty at least 1, (where we dene =2e P,22 = 2 ), the weghted sum at node X 2 wll be wthn of ts mean value = 1 p.thus, wth probablty at least 1,, we are assured that Pr[X 2 = 1] s at least f(, ) and at most f( + ). Of course, the p sde of the large devaton bound s that wth probablty at most, the weghted sum may fall more than away from. In ths case we can make no guarantees on Pr[X 2 = 1] asde from the trval lower and upper bounds of 0 and 1. Combnng both eventualtes, however, we obtan the overall bounds: (1, )f(, ) Pr[X 2 =1] (1, )f( + )+: (1) Equaton (1) s based on a smple P two-pont approxmaton to the dstrbuton over the weghted sum of nputs, 1 X1. Ths approxmaton places one pont, wth weght 1,, at ether above orbelow the mean (dependng on whether we are dervng the upper or lower bound); and the other pont, wth weght, at ether,1 or +1. The value of depends on the choce of : n partcular, as becomes smaller, we gve more weght to the 1 pont, wth the trade-o governed by the large devaton bound. We regard the weght gven to the 1 pont asa throw-away probablty, snce wth ths weght we resort to the trval bounds of 0 or 1 on the margnal probablty Pr[X 2 = 1]. ote that the very smple bounds n Equaton (1) already exhbt an nterestng trade-o, governed by the choce of the parameter namely, as becomes smaller, the throw-away probablty becomes larger, whle the terms f( ) converge to the same value. Snce the overall bounds nvolve products of f( ) and 1,, the optmal value of s the one that balances ths competton between probable explanatons of the evdence and mprobable devatons from the mean. Ths tradeo s remnscent of that encountered between energy and entropy n mean-eld approxmatons for symmetrc networks (Hertz et al, 1991). So far we have consdered the margnal probablty nvolvng a sngle node n the second layer. We can also compute bounds on the margnal probabltes nvolvng K>1nodes n ths layer (whch wthout loss of generalty we take to be the nodes X1 2 through XK 2 ). Ths s done by consderng the probablty that one or more of the weghted sums enterng these K nodes n the second layer devate by more than from ther means. We can upper bound ths probablty by K by appealng to the so-called unon bound, whch smply states that the probablty of a unon of events s bounded by the sum of ther ndvdual probabltes. The unon bound allows us to bound margnal probabltes nvolvng multple varables. For example,

4 consder the margnal probablty Pr[X1 2 =1;:::;XK 2 = 1]. Combnng the large devaton and unon bounds, we nd: (1,K) KY =1 f(,) Pr[X 2 1 =1;:::;X 2 K =1] (1,K) KY =1 f( +)+K: (2) Anumber of observatons are n order here. Frst, Equaton (2) drectly leads to ecent algorthms for computng the upper and lower bounds. Second, although for smplcty wehave consdered {devatons of the same sze at each node n the second layer, the same methods apply to derent choces of (and therefore ) at each node. Indeed, varatons n can lead to sgncantly tghter bounds, and thus we explot the freedom to choose derent n the rest of the paper. Ths results, for example, n bounds of the form: 1,! KX KY f(, ) Pr[X 2 1 =1;:::;XK 2 =1]; where =2e,22 = 2 : =1 =1 (3) The reader s nvted to study the small but mportant derences between ths lower bound and the one n Equaton (2). Thrd, the arguments leadng to bounds on the margnal probablty Pr[X1 2 =1;:::;XK 2 = 1] generalze n a straghtforward manner to other patterns of evdence besdes all 1's. For nstance, agan ust consderng the lower bound, we have:! KX Y Y 1, [1,f( + )] f(, ) Pr[X 2 1 = x 1;:::;XK 2 = x K] (4) =1 x =0 x =1 where x 2f0; 1g are arbtrary bnary values. Thus together the large devaton and unon bounds provde the means to compute upper and lower bounds on the margnal probabltes over nodes n the second layer. Further detals and consequences of these bounds for the specal case of two-layer networks are gven n a companon paper (Kearns & Saul, 1997); our nterest here, however, s n the more challengng generalzaton to multlayer networks. 4 Multlayer etworks: Inference va Inducton In extendng the deas of the prevous secton to multlayer networks, we face the problem that the nodes n the second layer, unlke those n the rst, are not ndependent. But we can stll adopt an nductve strategy to derve bounds on margnal probabltes. The crucal observaton s that condtoned on the values of the ncomng weghted sums at the nodes n the second layer, the varables fx 2 g do become ndependent. More generally, condtoned on these weghted sums all fallng \near" ther means an event whose probablty we quanted n the last secton the nodes fx 2 g become \almost" ndependent. It s exactly ths near-ndependence that we now formalze and explot nductvely to compute bounds for multlayer networks. The rst tool we requre s an approprate generalzaton of the large devaton bound, whch does not rely on precse knowledge of the means of the random varables beng summed. Theorem 1 For all 1, let X 2f0; 1g denote ndependent bnary random varables, and let. Suppose that the means are bounded bye[x ],p, P where 0 < p 1,. Then for all > 1 : Pr 2 4 X 1 (X, p ) > 5, 2 2e 3 2,, 1 P 2 : (5)

5 The proof of ths result s omtted due to space consderatons. ow for nducton, consder the nodes n the `th P layer of the network. Suppose we are told that for every, the weghted sum `,1 X`,1 enterng nto the node Xì les n the nterval [ì, ì;ì + ì], for some choce of the ì and the ì. Then the mean of node Xì s constraned to le n the nterval [pì, ì;pì +ì], where pì = 1 2 f(ì, ì)+f(ì + ì) (6) f(ì + ì), f(ì, ì) : (7) ì = 1 2 Here we have smply run the leftmost and rghtmost allowed values for the ncomng weghted sums through the transfer functon, and dened the nterval around the mean of unt Xì to be centered around pì. Thus we have translated uncertantes on the ncomng weghted sums to layer ` nto condtonal uncertantes on the means of the nodes Xì n layer `. To complete the cycle, we now translate these nto condtonal uncertantes on the ncomng weghted sums to layer ` + 1. In partcular, condtoned P on the orgnal ntervals [ì, ì;ì + ì], what s probablty that for each, ìx` les nsde some new nterval [, ; + ]? In order to make some guarantee on ths probablty, we set P = ìp` ì`. These condtons suce to ensure that and assume that the new ntervals contan the (condtonal) expected values of the weghted sums P > P ìx`, and that the new ntervals are large enough to encompass the ncomng uncertantes. Because these condtons are a mnmal requrement for establshng any probablstc guarantees, we shall say that the [ì, ì;ì + ì] dene a vald set of -ntervals f they meet these condtons for all 1. Gven a vald set of -ntervals at the (` + 1)th layer, t follows from Theorem 1 and the unon bound that the weghted sums enterng nodes n layer ` + 1 obey where Pr 2 4 X ` X`, > =2e, 2 2, for some 1, P 3 X 5 =1 (8) ì ` 2 : (9) In what follows, we shall frequently make use of the fact that the weghted sums P ìxì are bounded by ntervals [, ; + ]. Ths motvates the followng dentons. Denton 2 Gven a vald set of -ntervals and bnary values fxì = xìg for the nodes n the `th layer, we say that the (` +1)st layer of the network satses ts -ntervals f P ìx`, < for all 1. Otherwse, we say that the (` +1)st layer volates ts -ntervals. Suppose that we are gven a vald set of -ntervals and that we sample from the ont dstrbuton dened by the probablstc f-network. The rght hand sde of Equaton (8) provdes an upper bound on the condtonal probablty that the (` + 1)st layer volates ts -ntervals, gven that the `th layer dd not. Ths upper bound may be vacuous (that s, larger than 1), so let us denote by whchever s smaller the rght hand sde of Equaton (8), or 1; n other words, = mnnp o =1 ; 1. Snce at the `th layer, the probablty of volatng the -ntervals s at most ` we

6 are guaranteed that wth probablty at least Q`>1 [1, `], all the layers satsfy ther -ntervals. Conversely, we are guaranteed that the probablty that any layer volates ts -ntervals s at most 1, Q`>1 [1, `]. Treatng ths as a throw-away probablty,we can now compute upper and lower bounds on margnal probabltes nvolvng nodes at the th layer exactly as n the case of nodes at the second layer. Ths yelds the followng theorem. Theorem 2 For any subset fx1 ;:::;X Kg of the outputs of a probablstc f- network, for any settng x 1 ;:::;x K, and for any vald set of -ntervals, the margnal probablty of partal evdence n the output layer obeys: Y Y 1, ` Y f(, ) 1, f( + ) (10) `>1 x =1 x =0 Pr[X = 1 x 1;:::;XK = x K] Ỳ Y 1, ` Y f( + ) >1 x =1 x =0 1, f(, ) + 1, Ỳ >1 1, `! (11) Theorem 2 generalzes our earler results for margnal probabltes over nodes n the second layer; for example, compare Equaton (10) to Equaton (4). Agan, the upper and lower bounds can be ecently computed for all common transfer functons. 5 Rates of Convergence To demonstrate the power of Theorem 2, we consder how the gap (or addtve derence) between these upper and lower bounds on Pr[X1 = x 1 ;:::;XK = x K] behaves for some crude (but nformed) choces of the fìg. Our goal s to derve the rate at whch these upper and lower bounds converge to the same value as we examne larger and larger networks. Suppose we choose the -ntervals nductvely by denng 1 = 0 and settng = X ì ` + r 2 ln for some >1. From Equatons (8) and (9), ths choce gves 2 1,2 as an upper bound on the probablty that the (` + 1)th layer volates ts -ntervals. Moreover, denotng the gap between the upper and lower bounds n Theorem 2 by G, t can be shown that: G 2 r 2 ln 1, () 1, KX =1 Y v =1 6= f( + ) Y v =0 6= (12) 1, f(, ) + 2 2,1 : et us brey recall the dentons of the parameters on the rght hand sde of ths equaton: s the maxmal slope of the transfer functon f, s the number of nodes n each layer, K s the number of nodes wth evdence, = s tmes the largest weght n the network, s the number of layers, and >1 s a parameter at our dsposal. The rst term of ths bound essentally has a 1=p dependence on, but s multpled by a dampng factor that we mght typcally expect to decay exponentally wth the number K of outputs examned. To see ths, smply notce that each of the factors f( + ) and [1,f(, )] s bounded by 1; furthermore, (13)

7 snce all the means are bounded, f s large compared to then the are small, and each of these factors s n fact bounded by some value <1. Thus the rst term n Equaton (13) s bounded by a constant tmes K,1 K p ln()=. Snce t s natural to expect the margnal probablty of nterest tself to decrease exponentally wth K, ths s desrable and natural behavor. Of course, n the case of large K, the behavor of the resultng overall bound can be domnated by the second term 2= 2,1 of Equaton (13). In such stuatons, however, we can consder larger values of, possbly even of order K; ndeed, for sucently large, the rst term (whch scales lke p )must necessarly overtake the second one. Thus there s a clear trade-o between the two terms, as well as optmal value of that sets them to be (roughly) the same magntude. Generally speakng, for xed K and large, we observe that the derence between our upper p and lower bounds on Pr[X1 = x 1 ;:::;XK = x K]vanshes as O ln()=. 6 An Algorthm for Fxed Multlayer etworks We conclude by notng that the specc choces made for the parameters n Secton 5 to derve rates of convergence may be far from the optmal choces for a xed network of nterest. However, Theorem 2 drectly suggests a natural algorthm for approxmate probablstc nference. In partcular, regardng the upper and lower bounds on Pr[X 1 = x 1;:::;XK = x K] as functons of fìg, we can optmze these bounds by standard numercal methods. For the upper bound, we may perform gradent descent n the fìg to nd a local mnmum, whle for the lower bound, we may perform gradent ascent to nd a local maxmum. The components of these gradents n both cases are easly computable for all the commonly studed transfer functons. Moreover, the constrant of mantanng vald -ntervals can be enforced by mantanng a oor on the -ntervals n one layer n terms of those at the prevous one. The practcal applcaton of ths algorthm to nterestng Bayesan networks wll be studed n future work. References Cooper, G. (1990). Computatonal complexty of probablstc nference usng Bayesan belef networks. Artcal Intellgence 42: Hertz, J,. Krogh, A., & Palmer, R. (1991). Introducton to the theory of neural computaton. Addson-Wesley, Redwood Cty, CA. Hnton, G., Dayan, P., Frey, B., and eal, R. (1995). The wake-sleep algorthm for unsupervsed neural networks. Scence 268:1158{1161. Jordan, M., Ghahraman, Z., Jaakkola, T., & Saul,. (1997). An ntroducton to varatonal methods for graphcal models. In M. Jordan, ed. earnng n Graphcal Models. Kluwer Academc. Kearns, M., & Saul,. (1998). arge devaton methods for approxmate probablstc nference. In Proceedngs of the 14th Annual Conference on Uncertanty n Artcal Intellgence. eal, R. (1992). Connectonst learnng of belef networks. Artcal Intellgence 56:71{113. Pearl, J. (1988). Probablstc Reasonng n Intellgent Systems: etworks of Plausble Inference. Morgan Kaufmann, San Mateo, CA.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009 College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:

More information

Vapnik-Chervonenkis theory

Vapnik-Chervonenkis theory Vapnk-Chervonenks theory Rs Kondor June 13, 2008 For the purposes of ths lecture, we restrct ourselves to the bnary supervsed batch learnng settng. We assume that we have an nput space X, and an unknown

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

Chapter 11: Simple Linear Regression and Correlation

Chapter 11: Simple Linear Regression and Correlation Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Lecture 4. Instructor: Haipeng Luo

Lecture 4. Instructor: Haipeng Luo Lecture 4 Instructor: Hapeng Luo In the followng lectures, we focus on the expert problem and study more adaptve algorthms. Although Hedge s proven to be worst-case optmal, one may wonder how well t would

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law: CE304, Sprng 2004 Lecture 4 Introducton to Vapor/Lqud Equlbrum, part 2 Raoult s Law: The smplest model that allows us do VLE calculatons s obtaned when we assume that the vapor phase s an deal gas, and

More information

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING MACHINE LEANING Vasant Honavar Bonformatcs and Computatonal Bology rogram Center for Computatonal Intellgence, Learnng, & Dscovery Iowa State Unversty honavar@cs.astate.edu www.cs.astate.edu/~honavar/

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline Outlne Bayesan Networks: Maxmum Lkelhood Estmaton and Tree Structure Learnng Huzhen Yu janey.yu@cs.helsnk.f Dept. Computer Scence, Unv. of Helsnk Probablstc Models, Sprng, 200 Notces: I corrected a number

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

Hopfield Training Rules 1 N

Hopfield Training Rules 1 N Hopfeld Tranng Rules To memorse a sngle pattern Suppose e set the eghts thus - = p p here, s the eght beteen nodes & s the number of nodes n the netor p s the value requred for the -th node What ll the

More information

Computational Biology Lecture 8: Substitution matrices Saad Mneimneh

Computational Biology Lecture 8: Substitution matrices Saad Mneimneh Computatonal Bology Lecture 8: Substtuton matrces Saad Mnemneh As we have ntroduced last tme, smple scorng schemes lke + or a match, - or a msmatch and -2 or a gap are not justable bologcally, especally

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

Artificial Intelligence Bayesian Networks

Artificial Intelligence Bayesian Networks Artfcal Intellgence Bayesan Networks Adapted from sldes by Tm Fnn and Mare desjardns. Some materal borrowed from Lse Getoor. 1 Outlne Bayesan networks Network structure Condtonal probablty tables Condtonal

More information

Finding Dense Subgraphs in G(n, 1/2)

Finding Dense Subgraphs in G(n, 1/2) Fndng Dense Subgraphs n Gn, 1/ Atsh Das Sarma 1, Amt Deshpande, and Rav Kannan 1 Georga Insttute of Technology,atsh@cc.gatech.edu Mcrosoft Research-Bangalore,amtdesh,annan@mcrosoft.com Abstract. Fndng

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs

More information

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach A Bayes Algorthm for the Multtask Pattern Recognton Problem Drect Approach Edward Puchala Wroclaw Unversty of Technology, Char of Systems and Computer etworks, Wybrzeze Wyspanskego 7, 50-370 Wroclaw, Poland

More information

Maximum Likelihood Estimation (MLE)

Maximum Likelihood Estimation (MLE) Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Lecture 4 Hypothesis Testing

Lecture 4 Hypothesis Testing Lecture 4 Hypothess Testng We may wsh to test pror hypotheses about the coeffcents we estmate. We can use the estmates to test whether the data rejects our hypothess. An example mght be that we wsh to

More information

Expectation propagation

Expectation propagation Expectaton propagaton Lloyd Ellott May 17, 2011 Suppose p(x) s a pdf and we have a factorzaton p(x) = 1 Z n f (x). (1) =1 Expectaton propagaton s an nference algorthm desgned to approxmate the factors

More information

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also

More information

Case A. P k = Ni ( 2L i k 1 ) + (# big cells) 10d 2 P k.

Case A. P k = Ni ( 2L i k 1 ) + (# big cells) 10d 2 P k. THE CELLULAR METHOD In ths lecture, we ntroduce the cellular method as an approach to ncdence geometry theorems lke the Szemeréd-Trotter theorem. The method was ntroduced n the paper Combnatoral complexty

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information

Computing MLE Bias Empirically

Computing MLE Bias Empirically Computng MLE Bas Emprcally Kar Wa Lm Australan atonal Unversty January 3, 27 Abstract Ths note studes the bas arses from the MLE estmate of the rate parameter and the mean parameter of an exponental dstrbuton.

More information

Statistical Inference. 2.3 Summary Statistics Measures of Center and Spread. parameters ( population characteristics )

Statistical Inference. 2.3 Summary Statistics Measures of Center and Spread. parameters ( population characteristics ) Ismor Fscher, 8//008 Stat 54 / -8.3 Summary Statstcs Measures of Center and Spread Dstrbuton of dscrete contnuous POPULATION Random Varable, numercal True center =??? True spread =???? parameters ( populaton

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

Notes on Frequency Estimation in Data Streams

Notes on Frequency Estimation in Data Streams Notes on Frequency Estmaton n Data Streams In (one of) the data streamng model(s), the data s a sequence of arrvals a 1, a 2,..., a m of the form a j = (, v) where s the dentty of the tem and belongs to

More information

Convergence of random processes

Convergence of random processes DS-GA 12 Lecture notes 6 Fall 216 Convergence of random processes 1 Introducton In these notes we study convergence of dscrete random processes. Ths allows to characterze phenomena such as the law of large

More information

Randomness and Computation

Randomness and Computation Randomness and Computaton or, Randomzed Algorthms Mary Cryan School of Informatcs Unversty of Ednburgh RC 208/9) Lecture 0 slde Balls n Bns m balls, n bns, and balls thrown unformly at random nto bns usually

More information

Bayesian Networks. Course: CS40022 Instructor: Dr. Pallab Dasgupta

Bayesian Networks. Course: CS40022 Instructor: Dr. Pallab Dasgupta Bayesan Networks Course: CS40022 Instructor: Dr. Pallab Dasgupta Department of Computer Scence & Engneerng Indan Insttute of Technology Kharagpur Example Burglar alarm at home Farly relable at detectng

More information

Statistics for Economics & Business

Statistics for Economics & Business Statstcs for Economcs & Busness Smple Lnear Regresson Learnng Objectves In ths chapter, you learn: How to use regresson analyss to predct the value of a dependent varable based on an ndependent varable

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

Global Sensitivity. Tuesday 20 th February, 2018

Global Sensitivity. Tuesday 20 th February, 2018 Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values

More information

CS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016

CS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016 CS 29-128: Algorthms and Uncertanty Lecture 17 Date: October 26, 2016 Instructor: Nkhl Bansal Scrbe: Mchael Denns 1 Introducton In ths lecture we wll be lookng nto the secretary problem, and an nterestng

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.

More information

Lecture 17 : Stochastic Processes II

Lecture 17 : Stochastic Processes II : Stochastc Processes II 1 Contnuous-tme stochastc process So far we have studed dscrete-tme stochastc processes. We studed the concept of Makov chans and martngales, tme seres analyss, and regresson analyss

More information

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2) 1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14 APPROXIMAE PRICES OF BASKE AND ASIAN OPIONS DUPON OLIVIER Prema 14 Contents Introducton 1 1. Framewor 1 1.1. Baset optons 1.. Asan optons. Computng the prce 3. Lower bound 3.1. Closed formula for the prce

More information

18.1 Introduction and Recap

18.1 Introduction and Recap CS787: Advanced Algorthms Scrbe: Pryananda Shenoy and Shjn Kong Lecturer: Shuch Chawla Topc: Streamng Algorthmscontnued) Date: 0/26/2007 We contnue talng about streamng algorthms n ths lecture, ncludng

More information

Hidden Markov Models

Hidden Markov Models CM229S: Machne Learnng for Bonformatcs Lecture 12-05/05/2016 Hdden Markov Models Lecturer: Srram Sankararaman Scrbe: Akshay Dattatray Shnde Edted by: TBD 1 Introducton For a drected graph G we can wrte

More information

20. Mon, Oct. 13 What we have done so far corresponds roughly to Chapters 2 & 3 of Lee. Now we turn to Chapter 4. The first idea is connectedness.

20. Mon, Oct. 13 What we have done so far corresponds roughly to Chapters 2 & 3 of Lee. Now we turn to Chapter 4. The first idea is connectedness. 20. Mon, Oct. 13 What we have done so far corresponds roughly to Chapters 2 & 3 of Lee. Now we turn to Chapter 4. The frst dea s connectedness. Essentally, we want to say that a space cannot be decomposed

More information

Numerical Solution of Ordinary Differential Equations

Numerical Solution of Ordinary Differential Equations Numercal Methods (CENG 00) CHAPTER-VI Numercal Soluton of Ordnar Dfferental Equatons 6 Introducton Dfferental equatons are equatons composed of an unknown functon and ts dervatves The followng are examples

More information

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve

More information

More metrics on cartesian products

More metrics on cartesian products More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of

More information

Lecture 20: November 7

Lecture 20: November 7 0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:

More information

Chapter 9: Statistical Inference and the Relationship between Two Variables

Chapter 9: Statistical Inference and the Relationship between Two Variables Chapter 9: Statstcal Inference and the Relatonshp between Two Varables Key Words The Regresson Model The Sample Regresson Equaton The Pearson Correlaton Coeffcent Learnng Outcomes After studyng ths chapter,

More information

Lecture 4: November 17, Part 1 Single Buffer Management

Lecture 4: November 17, Part 1 Single Buffer Management Lecturer: Ad Rosén Algorthms for the anagement of Networs Fall 2003-2004 Lecture 4: November 7, 2003 Scrbe: Guy Grebla Part Sngle Buffer anagement In the prevous lecture we taled about the Combned Input

More information

Linear Regression Analysis: Terminology and Notation

Linear Regression Analysis: Terminology and Notation ECON 35* -- Secton : Basc Concepts of Regresson Analyss (Page ) Lnear Regresson Analyss: Termnology and Notaton Consder the generc verson of the smple (two-varable) lnear regresson model. It s represented

More information

Société de Calcul Mathématique SA

Société de Calcul Mathématique SA Socété de Calcul Mathématque SA Outls d'ade à la décson Tools for decson help Probablstc Studes: Normalzng the Hstograms Bernard Beauzamy December, 202 I. General constructon of the hstogram Any probablstc

More information

Snce h( q^; q) = hq ~ and h( p^ ; p) = hp, one can wrte ~ h hq hp = hq ~hp ~ (7) the uncertanty relaton for an arbtrary state. The states that mnmze t

Snce h( q^; q) = hq ~ and h( p^ ; p) = hp, one can wrte ~ h hq hp = hq ~hp ~ (7) the uncertanty relaton for an arbtrary state. The states that mnmze t 8.5: Many-body phenomena n condensed matter and atomc physcs Last moded: September, 003 Lecture. Squeezed States In ths lecture we shall contnue the dscusson of coherent states, focusng on ther propertes

More information

U-Pb Geochronology Practical: Background

U-Pb Geochronology Practical: Background U-Pb Geochronology Practcal: Background Basc Concepts: accuracy: measure of the dfference between an expermental measurement and the true value precson: measure of the reproducblty of the expermental result

More information

3.1 ML and Empirical Distribution

3.1 ML and Empirical Distribution 67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum

More information

Conjugacy and the Exponential Family

Conjugacy and the Exponential Family CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the

More information

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA 4 Analyss of Varance (ANOVA) 5 ANOVA 51 Introducton ANOVA ANOVA s a way to estmate and test the means of multple populatons We wll start wth one-way ANOVA If the populatons ncluded n the study are selected

More information

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Hidden Markov Models & The Multivariate Gaussian (10/26/04) CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models

More information

Supplementary Notes for Chapter 9 Mixture Thermodynamics

Supplementary Notes for Chapter 9 Mixture Thermodynamics Supplementary Notes for Chapter 9 Mxture Thermodynamcs Key ponts Nne major topcs of Chapter 9 are revewed below: 1. Notaton and operatonal equatons for mxtures 2. PVTN EOSs for mxtures 3. General effects

More information

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers Psychology 282 Lecture #24 Outlne Regresson Dagnostcs: Outlers In an earler lecture we studed the statstcal assumptons underlyng the regresson model, ncludng the followng ponts: Formal statement of assumptons.

More information

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU Group M D L M Chapter Bayesan Decson heory Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty Bayesan Decson heory Bayesan decson theory s a statstcal approach to data mnng/pattern

More information

EM and Structure Learning

EM and Structure Learning EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder

More information

Expected Value and Variance

Expected Value and Variance MATH 38 Expected Value and Varance Dr. Neal, WKU We now shall dscuss how to fnd the average and standard devaton of a random varable X. Expected Value Defnton. The expected value (or average value, or

More information

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE Analytcal soluton s usually not possble when exctaton vares arbtrarly wth tme or f the system s nonlnear. Such problems can be solved by numercal tmesteppng

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 89 Fall 206 Introducton to Machne Learnng Fnal Do not open the exam before you are nstructed to do so The exam s closed book, closed notes except your one-page cheat sheet Usage of electronc devces

More information

Solution Thermodynamics

Solution Thermodynamics Soluton hermodynamcs usng Wagner Notaton by Stanley. Howard Department of aterals and etallurgcal Engneerng South Dakota School of nes and echnology Rapd Cty, SD 57701 January 7, 001 Soluton hermodynamcs

More information

Foundations of Arithmetic

Foundations of Arithmetic Foundatons of Arthmetc Notaton We shall denote the sum and product of numbers n the usual notaton as a 2 + a 2 + a 3 + + a = a, a 1 a 2 a 3 a = a The notaton a b means a dvdes b,.e. ac = b where c s an

More information

Comparison of Regression Lines

Comparison of Regression Lines STATGRAPHICS Rev. 9/13/2013 Comparson of Regresson Lnes Summary... 1 Data Input... 3 Analyss Summary... 4 Plot of Ftted Model... 6 Condtonal Sums of Squares... 6 Analyss Optons... 7 Forecasts... 8 Confdence

More information

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models

More information

FREQUENCY DISTRIBUTIONS Page 1 of The idea of a frequency distribution for sets of observations will be introduced,

FREQUENCY DISTRIBUTIONS Page 1 of The idea of a frequency distribution for sets of observations will be introduced, FREQUENCY DISTRIBUTIONS Page 1 of 6 I. Introducton 1. The dea of a frequency dstrbuton for sets of observatons wll be ntroduced, together wth some of the mechancs for constructng dstrbutons of data. Then

More information

THE SUMMATION NOTATION Ʃ

THE SUMMATION NOTATION Ʃ Sngle Subscrpt otaton THE SUMMATIO OTATIO Ʃ Most of the calculatons we perform n statstcs are repettve operatons on lsts of numbers. For example, we compute the sum of a set of numbers, or the sum of the

More information

Why BP Works STAT 232B

Why BP Works STAT 232B Why BP Works STAT 232B Free Energes Helmholz & Gbbs Free Energes 1 Dstance between Probablstc Models - K-L dvergence b{ KL b{ p{ = b{ ln { } p{ Here, p{ s the eact ont prob. b{ s the appromaton, called

More information

Lecture 4: September 12

Lecture 4: September 12 36-755: Advanced Statstcal Theory Fall 016 Lecture 4: September 1 Lecturer: Alessandro Rnaldo Scrbe: Xao Hu Ta Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer: These notes have not been

More information

Complete subgraphs in multipartite graphs

Complete subgraphs in multipartite graphs Complete subgraphs n multpartte graphs FLORIAN PFENDER Unverstät Rostock, Insttut für Mathematk D-18057 Rostock, Germany Floran.Pfender@un-rostock.de Abstract Turán s Theorem states that every graph G

More information

Limited Dependent Variables

Limited Dependent Variables Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

Economics 101. Lecture 4 - Equilibrium and Efficiency

Economics 101. Lecture 4 - Equilibrium and Efficiency Economcs 0 Lecture 4 - Equlbrum and Effcency Intro As dscussed n the prevous lecture, we wll now move from an envronment where we looed at consumers mang decsons n solaton to analyzng economes full of

More information

Lossy Compression. Compromise accuracy of reconstruction for increased compression.

Lossy Compression. Compromise accuracy of reconstruction for increased compression. Lossy Compresson Compromse accuracy of reconstructon for ncreased compresson. The reconstructon s usually vsbly ndstngushable from the orgnal mage. Typcally, one can get up to 0:1 compresson wth almost

More information

2 STATISTICALLY OPTIMAL TRAINING DATA 2.1 A CRITERION OF OPTIMALITY We revew the crteron of statstcally optmal tranng data (Fukumzu et al., 1994). We

2 STATISTICALLY OPTIMAL TRAINING DATA 2.1 A CRITERION OF OPTIMALITY We revew the crteron of statstcally optmal tranng data (Fukumzu et al., 1994). We Advances n Neural Informaton Processng Systems 8 Actve Learnng n Multlayer Perceptrons Kenj Fukumzu Informaton and Communcaton R&D Center, Rcoh Co., Ltd. 3-2-3, Shn-yokohama, Yokohama, 222 Japan E-mal:

More information

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there

More information

TAIL BOUNDS FOR SUMS OF GEOMETRIC AND EXPONENTIAL VARIABLES

TAIL BOUNDS FOR SUMS OF GEOMETRIC AND EXPONENTIAL VARIABLES TAIL BOUNDS FOR SUMS OF GEOMETRIC AND EXPONENTIAL VARIABLES SVANTE JANSON Abstract. We gve explct bounds for the tal probabltes for sums of ndependent geometrc or exponental varables, possbly wth dfferent

More information

The optimal delay of the second test is therefore approximately 210 hours earlier than =2.

The optimal delay of the second test is therefore approximately 210 hours earlier than =2. THE IEC 61508 FORMULAS 223 The optmal delay of the second test s therefore approxmately 210 hours earler than =2. 8.4 The IEC 61508 Formulas IEC 61508-6 provdes approxmaton formulas for the PF for smple

More information

Learning Theory: Lecture Notes

Learning Theory: Lecture Notes Learnng Theory: Lecture Notes Lecturer: Kamalka Chaudhur Scrbe: Qush Wang October 27, 2012 1 The Agnostc PAC Model Recall that one of the constrants of the PAC model s that the data dstrbuton has to be

More information

Some modelling aspects for the Matlab implementation of MMA

Some modelling aspects for the Matlab implementation of MMA Some modellng aspects for the Matlab mplementaton of MMA Krster Svanberg krlle@math.kth.se Optmzaton and Systems Theory Department of Mathematcs KTH, SE 10044 Stockholm September 2004 1. Consdered optmzaton

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

Difference Equations

Difference Equations Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1

More information

Classification as a Regression Problem

Classification as a Regression Problem Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class

More information

Uncertainty and auto-correlation in. Measurement

Uncertainty and auto-correlation in. Measurement Uncertanty and auto-correlaton n arxv:1707.03276v2 [physcs.data-an] 30 Dec 2017 Measurement Markus Schebl Federal Offce of Metrology and Surveyng (BEV), 1160 Venna, Austra E-mal: markus.schebl@bev.gv.at

More information

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise. Chapter - The Smple Lnear Regresson Model The lnear regresson equaton s: where y + = β + β e for =,..., y and are observable varables e s a random error How can an estmaton rule be constructed for the

More information

Solving Nonlinear Differential Equations by a Neural Network Method

Solving Nonlinear Differential Equations by a Neural Network Method Solvng Nonlnear Dfferental Equatons by a Neural Network Method Luce P. Aarts and Peter Van der Veer Delft Unversty of Technology, Faculty of Cvlengneerng and Geoscences, Secton of Cvlengneerng Informatcs,

More information