In: Fourth International Conference on Articial Neural Networks, Churchill College, University of
|
|
- Jesse Watts
- 6 years ago
- Views:
Transcription
1 In: Fourth Internatonal Conference on Artcal Neural Networks, Churchll College, Unversty of Cambrdge, UK. IEE Conference Publcaton No. 9 pp 6-65, (995) ON THE RELATIONSHIP BETWEEN BAYESIAN ERROR BARS AND THE INPUT DATA DENSITY C K I Wllams, C Qazaz, C M Bshop and H hu Neural Computng Research Group, Aston Unversty, UK. ABSTRACT We nvestgate the dependence of Bayesan error bars on the dstrbuton of data n nput space. For generalzed lnear regresson models we derve an upper bound on the error bars whch shows that, n the neghbourhood of the data ponts, the error bars are substantally reduced from ther pror values. For regons of hgh data densty we also show that the contrbuton to the output varance due to the uncertanty n the weghts can exhbt an approxmate nverse proportonalty to the probablty densty. Emprcal results support these conclusons. INTRODUCTION When gven a predcton, t s also very useful to be gven some dea of the \error bars" assocated wth that predcton. Error bars arse naturally n a Bayesan treatment of neural networks and are made up of two terms, one due to the posteror weght uncertanty, and the other due to the ntrnsc nose n the data. As the two contrbutons are ndependent, we have y(x) w(x) + (x) () where w(x) s the varance of the output due to weght uncertanty and (x) s the varance of the ntrnsc nose. Under the assumpton that the posteror n weght space can be approxmated by a Gaussan (MacKay ()), we have w(x) g T (x)a g(x) () where A s the Hessan matrx of the model and w)@w s the vector of the dervatves of the output wth respect to the weght parameters n the network. A contans contrbutons from both the pror dstrbuton on the weghts and the eect of the tranng data. Although the weght uncertanty component of the error bar s gven by equaton, the dependence of ths quantty on the locaton of the tranng ponts s not at all obvous. Intutvely we would expect If the network used s not the correct generatve model for the data there wll be a thrd component due to model ms-speccaton; we do not dscuss ths further n ths paper. the error bars from the pror (.e. before any data s seen) to be qute large, and that the eect of the tranng data would be to reduce the magntude of the error bars for those regons of the nput space close to the data ponts, whle leavng large error bars further away. The purpose of ths paper s to provde theoretcal nsghts to support ths ntuton. In partcular, our analyss focusses on generalzed lnear regresson (such as radal bass functon networks wth xed bass functon parameters) and allows us to quantfy the extent of the reducton and the length scale over whch t occurs. We also show that the relatonshp w(x) ' [N p(x)v (x)] holds approxmately, where p(x) s the densty of the data n the nput space, N s the number of data ponts n the tranng set and V (x) s a functon of x that measures a volume n the nput space. Ths relatonshp pertans to the \hgh-data" lmt where the eect of the data overwhelms the pror n the Hessan. GENERALIED LINEAR REGRESSION Consder a generalzed lnear regresson (GLR) model of the form mx y(x) T (x)w w j j (x) (3) j where j ; :::; m labels the bass functons f j g of the model. Gven a data set D ((x ; t); (x ; t); : : : ; (x N ; t N )), a squared error functon wth nose varance and a regularzer of the form w T Sw, the posteror mean value of the weghts ^w s the choce of w that mnmzes the quadratc form X t j so that ^w s the soluton of w j j (x ) A + wt Sw () (B + S) ^w T t (5) In ths secton we assume that s ndependent of x. Ths assumpton can be easly relaxed, but at the expense of somewhat more complcated notaton.
2 where s the n m desgn matrx (x ) (x ) m (x ) (x ) (x ) m (x ) C.... A (6) (x n ) (x n ) m (x n ) B T and t s the vector of targets. Wrtng A B + S, we nd ^y(x) T (x) ^w T (x)a T t def k T (x)t (7) where ^y(x) s the functon obtaned from equaton 3 usng ^w as the weght vector. Equaton 7 denes the eectve kernel k(x) and makes t clear that ^y(x) can be wrtten as a lnear combnaton of the target values,.e. t s a lnear smoother (see, e.g. Haste and Tbshran ()). The contrbuton of the uncertanty of the weghts to the varance of the predcton s gven from equaton by w (x) T (x)a (x) (8) Note that for generalzed lnear regresson ths expresson s exact, and that the error bars (gven ) are ndependent of the targets. 3 ERROR BARS FOR GLR + pror nose level posteror Fgure : A schematc llustraton of the eect of one data pont on y (x). The posteror varance s reduced from ts pror level n the neghbourhood of the data pont (+), but remans above the nose level. In ths secton we analyze the response of the pror varance to the addton of the data ponts. In partcular we show that the eect of a sngle data pont s to pull the y (x) surface down to a value less than (x)3 at and nearby to the data pont, and that 3 The analyss n ths secton permts the nose level to vary as a functon of x. the length scale over whch ths eect operates s determned by the pror covarance functon where A S. C(x; x ) T (x)a (x ) (9) The man tool used n ths analyss s the eect of addng just one data pont. A schematc llustraton of ths eect s shown n Fgure. The varance due to the pror s qute large (and roughly constant over x-space). Addng a sngle data pont pulls down the varance n ts neghbourhood (but not as far as the lmt). Fgure s relevant because we can show (see Appendx A.) that y (x), when all data ponts are used to compute the Hessan, s never greater than y(x) when any subset of the data ponts are used, and hence the surface pertanng to any partcular data pont s an upper bound on the overall surface. To obtan a bound on the depth of the dp, consder the case when there s only one data pont (at x x ), so that the Hessan s gven by A A + (x )(x ) T (x ). Usng the dentty (M + vv T ) M (M v)(v T M ) + v T M v t s easy to show that () wjx (x ) (x ) () + r where wjx denotes the posteror weght uncertanty surface due to a data pont at x and r r T (x )A (x ) (x ) ().e. r s the rato of the pror to nose varances at the pont x. For any postve value of z, the functon z( + z) les between and, hence we see that the w contrbuton to the error bars must always be less than (x) at a data pont. Typcally the nose varance s much smaller than the pror varance, so r. Further evdence that w at any data pont s of the order of (x) s provded by the calculaton n appendx A. whch shows that the average of w (x ) at the data ponts s less than mn, where m s the number of weghts n the model and N s the number of data ponts. For a sngle data pont at x, we can use equaton to show that wjx (x) C(x; x) (C(x; x )) (3) + C(x ; x ) Hence the wdth of the depresson n the varance surface s related to the characterstc length scale of the
3 pror covarance functon C(x; x ). It s also possble to show that f a test pont x has zero covarance C(x; x ) wth all of the tranng ponts fx g, then ts posteror varance wll be equal ts pror varance. We are currently explorng the propertes of C(x; x ) for derent weght prors and choces of bass functons. However, we note that a smple dagonal pror S I as used by some authors s not n general a very sensble pror, because f the type of bass functons used (e.g. Gaussans, tanh functons etc.) s changed, then the covarance structure of the pror also changes. More sensbly, the weght pror should be chosen so as to approxmate some desred pror covarance functon C(x; x ). DENSITY DEPENDENCE OF w(x) As we have already noted, error bars on network predctons would be expected to be relatvely large n regons of nput space for whch there s lttle data, and smaller n regons of hgh data densty. In ths secton, we establsh an approxmate proportonalty between the varance due to weght uncertanty and the nverse of the probablty densty of tranng data, vald n regons of hgh data densty. A relatonshp of ths knd was conjectured n Bshop (3). We rst consder a specal case of the class of generalzed lnear models where the bass functons are nonoverlappng bn (or \top-hat") actvaton functons. Let the th bass functon have heght h and a d- dmensonal \base area" of V, where d s the dmensonalty of x. If we choose a dagonal pror (S I) then the Hessan s dagonal and thus easy to nvert. B j (x q ) j (x q ) q n h f j otherwse () where n s the number of data ponts fallng n bn. From equaton (8) the error bars assocated wth a pont x whch falls nto the th bn are gven by w(x) h + n (5) As usual, the eect of the pror s to reduce the sze of error bar compared to the case where t s not present. In the lmt of! we have w (x) n N V ^p(x) (6) where N s the total number of data ponts and ^p(x) s the hstogram estmate of the densty nsde the bn contanng x. Equaton (6) demonstrates that for Ths analyss can easly be extended to arbtrary nonconstant bass functons as long as they do not overlap. ths knd of model the error bars are nversely proportonal to the nput densty and to the volume factor V. It also shows that we can understand the reducton n the varance y (x) n regons of hgh densty as the n eect for the varance of the mean of n (d) Gaussan varables each of whch has varance. The am of the remander of ths secton s to show how results smlar to those for the bn bass functons can be obtaned, n certan crcumstances, for generalzed lnear regresson models,.e. that the error bars wll be nversely proportonal to p(x) and an area factor V (x). The key dea needed s that of an eectve kernel, whch we now descrbe. As noted n equaton 7, we can wrte ^y(x) k T (x)t, where k(x) s the eectve kernel. To take ths analyss further t s helpful to thnk of ^y(x) k T (x)t P k t as an approxmaton to the ntegral ^y(x) R K(z; x)t(z)dz, where K(z; x) (regarded as a functon of z) s the eectve kernel for the pont x and t(z) s a \target functon". Followng smlar reasonng we obtan B (x q ) T (x q ) ' N p(x)(x) T (x) dx q (7) If the orgnal bass functons are lnearly combned to produce a new set ~ C, then the matrx C R can be chosen so that p(x) ~ (x) ~ j (x)dx j, where j s the Kronecker delta. From now on t s assumed that we are workng wth the orthonormal bass functons (.e. the tldes are omtted) and that B NI. Ignorng the weght pror we obtan However, ^w A T t N T t (8) ( T t) (x q )t(x q ) ' N q and so (z)p(z)t(z)dz (9) ^y(x) N T (x) T t () ( X (x) (z)p(z) ) t(z)dz () K(z; x)t(z)dz () We can also show that K(z; x) s the projecton of the delta functon onto the bass space f g, where (x) (x)p(x), and that f a constant (bas) functon s one of the orgnal R bass functons (before orthonormalzaton), then K(z; x)dz. The fact that K(z; x) s an approxmaton to the delta 3
4 functon suggests that as the number of bass functons ncreases the eectve kernel should become more tghtly peaked and concentrated around x..6.5 A We now turn to the varance of the generalzed lnear model. Usng orthonormal bass functons, the error bar at x s gven by y(x) N T (x)(x) 5. However, ths can be rewrtten n terms of the eectve kernel w(x) N usng the orthonormalty propertes. K (z; x) dz (3) p(z) If K(z; x) s sharply peaked around x (.e. t looks somethng lke a Gaussan) then the p(z) n the denomnator can R be pulled through the ntegral sgn as p(x). Also, K (z; x)dz measures the nverse base area of K(z; x); for example, for a one dmensonal Gaussan wth R standard devaton centered at x we nd that K (z; x)dz p ( ). Denng we can wrte K (z; x)dz def V (x) () probablty densty nose varance/(n*varance) B Gaussan Sgmod Polynomal Network w(x) ' N p(x)v (x) (5) By extendng the analyss of appendx A. to the contnuous case we obtan w(x)p(x)dx N (6) where s the eectve number of parameters n the model (), showng that we would expect w (x) to be larger for a model wth more parameters. Under the assumpton that K(z; x) s sharply peaked about x we have obtaned a result n equaton 5 smlar to equaton 6 for the bn bass functons. We wll now present evdence to show that ths relatonshp holds expermentally. The rst experment has a one dmensonal nput space. The probablty densty form whch the data was drawn s shown n Fgure (A). Fgure (B) shows that for a range of GLR models (and for a two-layer perceptron) there s a close relatonshp between w(x) and the densty, ndcatng that V (x) s roughly constant n the hgh densty regons for these models. Ths concluson s backed up by Fgure, whch plots V (x) R K (z; x)dz aganst x. The log-log plot n Fgure 3 also ndcates that the relatonshp w(x) / p (x) holds qute relably, especally for areas wth hgh data densty. 5 It s nterestng to note that the error bar w (x) can also be obtaned from the nte-dmensonal eectve kernel dened by ^y(x) k T (x)t. Usng the assumpton that each t has ndependent, zero-mean nose of varance, we nd that the varance of the lnear combnaton ^y(x) s w (x) kt k, whch can easly be shown to be equvalent to w (x) T N for. Fgure : (A) A mxture of two Gaussan denstes, from whch data ponts were drawn for the experments. (B) shows the (scaled) nverse varance aganst x for three generalzed lnear regresson (GLR) models and a neural network. The GLR models used Gaussan, sgmod and polynomal bass functons respectvely, and each model conssted of 6 bass functons and a bas and was traned on data ponts. (B) also shows the nverse varance for a two layer perceptron wth two hdden unts. The net was traned on a data set consstng of data ponts wth nputs drawn from the densty shown n panel (A) and targets generated from sn(x) wth added zero-mean Gaussan nose of standard devaton.. For all four models the smlarty between the nverse varance for these models and the plot of the densty s strkng. Fgure (B) also shows that the dependence of the overall magntude of w on the number of eectve parameters descrbed n equaton 6 holds; the twolayer perceptron, whch has only seven weghts compared to the 6 n the GLR models, has a correspondngly larger nverse varance. Some eectve kernels for the GLR model wth a bas and 6 Gaussan bass functons of standard devaton :5, spaced equally between 5: and : are shown n Fgures 5 and 6 6. The kernels n Fgure 5 correspond to areas of hgh densty and show a strong, narrow sngle peak. For regons of low densty Fgure 6 shows that the kernels are much wder and more oscllatory, ndcatng that target values from a wde range of x values are used to compute ^y(x). As the wdths of the kernels n the low densty regons 6 Smlar f g and kernels are obtaned for sgmodal and polynomal bass functons.
5 log(nverse probablty densty) a -. a log(varance) Fgure 3: Plot of the log nverse densty of the nput data aganst the log of w (x) for a generalzed lnear model wth 6 Gaussan bass functons. Note that the ponts le close to the lne wth slope, ndcatng that w (x) / p (x). Fgure 5: Eectve kernels at x : and :, correspondng to hgh densty regons, as shown n gure (A). See text for further dscusson. agan ndcatng that for regons of hgh data densty the nose term wll domnate. 8 5 DISCUSSION Fgure : Plot of V (x) K (z; x)dz aganst x for a GLR model wth 6 Gaussan bass functons spaced equally between 5: and :, and a bas. Note that the plot s roughly constant n regons of hgh densty. are greater than the length scale of the varaton of the densty, we would expect the approxmaton used n equaton 5 to break down at ths pont. We have conducted several other experments wth one and two dmensonal nput spaces whch produce smlar results to those shown n the log-log plot, Fgure 3, ncludng a two-layer perceptron whch learned to approxmate a functon of two nputs. Whle ths relatonshp between w(x) and the nput data densty s nterestng, t should be noted that ts valdty s lmted at best to regons of hgh data densty. Furthermore, n such regons the contrbuton to the error bars from w(x) s dwarfed by that from the nose term. Ths can be seen n the case of non-overlappng bass functons from equaton 6. More generally we can consder the extenson of the result to the case of n data ponts all located at x. Ths leads to R r wjn x (x ) ' + n r n (7) In ths paper we have analyzed the behavour of the Bayesan error bars for generalzed lnear regresson models. For the case of a sngle solated data pont we have shown that the error bar s pulled down close to the nose level, and that the length scale over whch ths eect occurs s characterzed by the pror covarance functon. We have also shown theoretcally that, n regons of hgh data densty, the contrbuton to the output varance due to the uncertanty n the weghts can exhbt an approxmate nverse proportonalty to the data densty. These ndngs have been supported by numercal smulaton. Also, we have noted that, n such hgh-densty regons, ths contrbuton to the varance wll be nsgncant compared to the contrbuton arsng from the nose term. Although much of the theoretcal analyss has been performed for generalzed lnear regresson models, there s emprcal evdence that smlar results hold also for mult-layer networks. Furthermore, f the outputs of the network have lnear actvaton functons, then under least-squares tranng t s eectvely a generalzed lnear regresson model wth adaptve bass functons. It s therefore a lnear smoother wth ^y(x) k T (x)t, and hence the result that w k T k wll stll hold. Other results, ncludng the expresson 36 derved n Appendx A., also hold for general non-lnear networks, provded we make the usual Gaussan approxmaton for the posteror weght dstrbuton, and the outer-product approxmaton to the Hessan. One potentally mportant lmtaton of the models 5
6 a APPENDICES A. In ths appendx we show that for generalzed lnear regresson, yjd(x) yjt (x), where D s the full data set ((x ; t ); : : : ; (x N ; t N )) and T s a subset of ths data set a Fgure 6: Eectve kernels at x :5 and 3: correspondng to low densty regons of the nput space, as shown n gure (A). Note that the densty functon seems to dene an \envelope" for the lower kernel; even though x may be n a low densty regon, the magntude of K(z; x) s largest n the hgh densty regons. See text for further dscusson. We rst note that as (x) s equal n both cases, we are only concerned about the relatve contrbutons from the weght uncertanty to the overall varance. The key to the proof s to decompose the Hessan A nto two parts, A and A, where A A + X qt q q T q A X q6t q q T q (8) and A S. Note that A and A are symmetrc nonnegatve dente, and hence A and A are also (usng the Moore-Penrose pseudo-nverse f necessary). The matrx dentty (A + A ) A A (A + A ) A (9) mples that for any vector v v T (A + A ) v (3) v T A v (A v)t (A + A ) (A v) From non-negatve dente condton we see that the second term n equaton 3 s always non-negatve, and hence v T A v v T (A + A ) v (3) Substtutng (x) for v completes the proof. consdered n ths paper (and ndeed of the models consdered by most authors) s that the nose varance s assumed to be a constant, ndependent of x. To understand why ths assumpton may be partcularly restrctve, consder the stuaton n whch there s a lot of data n one regon of nput space and a sngle data pont n another regon. The estmate of the nose varance, whch we shall assume to be relatvely small, wll be domnated by the hgh densty regon. However, as we have seen, the error bar wll be pulled down to less than n the neghbourhood of the solated data pont. The model s therefore hghly condent of the regresson functon (.e. the most probable nterpolant) n ths regon even though there s only a sngle data pont present! If, however, we relax the assumpton of a constant then we see that there n the neghbourhood of the solated data pont there s lttle evdence to suggest a small value of and so we would expect much larger error bars. We are currently nvestgatng models n whch (x) s adapted to the data. Acknowledgements Ths work was manly supported by EPSRC grant GR/J755 (CW) and by an EPSRC post-graduate scholarshp to CQ. A. In ths appendx we show that w, the average value of w(x) evaluated at the data ponts, s equal to N, where ( m) s the eectve number of parameters n the model (). w References N N X N tr[(x w(x ) (3) T A (33) T )A ] (3) N tr(ba ) (35) N (36). MacKay D. J. C., 99, \Bayesan Interpolaton", Neural Computaton, (3), 5{7.. Haste T. J. and Tbshran R. J., 99, \Generalzed Addtve Models", Chapman and Hall. 3. Bshop C. M., 99, \Novelty detecton and neural network valdaton", IEE Proceedngs: Vson, Image and Sgnal Processng,, 7{. 6
Lecture 12: Classification
Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna
More informationLectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix
Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More informationLinear Approximation with Regularization and Moving Least Squares
Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...
More informationInner Product. Euclidean Space. Orthonormal Basis. Orthogonal
Inner Product Defnton 1 () A Eucldean space s a fnte-dmensonal vector space over the reals R, wth an nner product,. Defnton 2 (Inner Product) An nner product, on a real vector space X s a symmetrc, blnear,
More informationChapter 11: Simple Linear Regression and Correlation
Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests
More informationj) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1
Random varables Measure of central tendences and varablty (means and varances) Jont densty functons and ndependence Measures of assocaton (covarance and correlaton) Interestng result Condtonal dstrbutons
More informationEEE 241: Linear Systems
EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationNegative Binomial Regression
STATGRAPHICS Rev. 9/16/2013 Negatve Bnomal Regresson Summary... 1 Data Input... 3 Statstcal Model... 3 Analyss Summary... 4 Analyss Optons... 7 Plot of Ftted Model... 8 Observed Versus Predcted... 10 Predctons...
More informationP R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /
Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons
More information3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X
Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number
More informationChat eld, C. and A.J.Collins, Introduction to multivariate analysis. Chapman & Hall, 1980
MT07: Multvarate Statstcal Methods Mke Tso: emal mke.tso@manchester.ac.uk Webpage for notes: http://www.maths.manchester.ac.uk/~mkt/new_teachng.htm. Introducton to multvarate data. Books Chat eld, C. and
More informationSupport Vector Machines. Vibhav Gogate The University of Texas at dallas
Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest
More information/ n ) are compared. The logic is: if the two
STAT C141, Sprng 2005 Lecture 13 Two sample tests One sample tests: examples of goodness of ft tests, where we are testng whether our data supports predctons. Two sample tests: called as tests of ndependence
More informationUncertainty as the Overlap of Alternate Conditional Distributions
Uncertanty as the Overlap of Alternate Condtonal Dstrbutons Olena Babak and Clayton V. Deutsch Centre for Computatonal Geostatstcs Department of Cvl & Envronmental Engneerng Unversty of Alberta An mportant
More informationLINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity
LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 31 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 6. Rdge regresson The OLSE s the best lnear unbased
More informationPHYS 450 Spring semester Lecture 02: Dealing with Experimental Uncertainties. Ron Reifenberger Birck Nanotechnology Center Purdue University
PHYS 45 Sprng semester 7 Lecture : Dealng wth Expermental Uncertantes Ron Refenberger Brck anotechnology Center Purdue Unversty Lecture Introductory Comments Expermental errors (really expermental uncertantes)
More informationAPPENDIX A Some Linear Algebra
APPENDIX A Some Lnear Algebra The collecton of m, n matrces A.1 Matrces a 1,1,..., a 1,n A = a m,1,..., a m,n wth real elements a,j s denoted by R m,n. If n = 1 then A s called a column vector. Smlarly,
More information10-701/ Machine Learning, Fall 2005 Homework 3
10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40
More informationLinear Regression Analysis: Terminology and Notation
ECON 35* -- Secton : Basc Concepts of Regresson Analyss (Page ) Lnear Regresson Analyss: Termnology and Notaton Consder the generc verson of the smple (two-varable) lnear regresson model. It s represented
More informationNUMERICAL DIFFERENTIATION
NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the
More informationC4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )
C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z
More informationDifference Equations
Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1
More informationMLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012
MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:
More informationLecture 10 Support Vector Machines II
Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed
More informationFall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede
Fall 0 Analyss of Expermental easurements B. Esensten/rev. S. Errede We now reformulate the lnear Least Squares ethod n more general terms, sutable for (eventually extendng to the non-lnear case, and also
More informationStatistics Chapter 4
Statstcs Chapter 4 "There are three knds of les: les, damned les, and statstcs." Benjamn Dsrael, 1895 (Brtsh statesman) Gaussan Dstrbuton, 4-1 If a measurement s repeated many tmes a statstcal treatment
More informationWeek 5: Neural Networks
Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple
More informationHomework Assignment 3 Due in class, Thursday October 15
Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.
More informationDifferentiating Gaussian Processes
Dfferentatng Gaussan Processes Andrew McHutchon Aprl 17, 013 1 Frst Order Dervatve of the Posteror Mean The posteror mean of a GP s gven by, f = x, X KX, X 1 y x, X α 1 Only the x, X term depends on the
More informationInductance Calculation for Conductors of Arbitrary Shape
CRYO/02/028 Aprl 5, 2002 Inductance Calculaton for Conductors of Arbtrary Shape L. Bottura Dstrbuton: Internal Summary In ths note we descrbe a method for the numercal calculaton of nductances among conductors
More informationWorkshop: Approximating energies and wave functions Quantum aspects of physical chemistry
Workshop: Approxmatng energes and wave functons Quantum aspects of physcal chemstry http://quantum.bu.edu/pltl/6/6.pdf Last updated Thursday, November 7, 25 7:9:5-5: Copyrght 25 Dan Dll (dan@bu.edu) Department
More information= = = (a) Use the MATLAB command rref to solve the system. (b) Let A be the coefficient matrix and B be the right-hand side of the system.
Chapter Matlab Exercses Chapter Matlab Exercses. Consder the lnear system of Example n Secton.. x x x y z y y z (a) Use the MATLAB command rref to solve the system. (b) Let A be the coeffcent matrx and
More informationGlobal Sensitivity. Tuesday 20 th February, 2018
Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values
More informationAn (almost) unbiased estimator for the S-Gini index
An (almost unbased estmator for the S-Gn ndex Thomas Demuynck February 25, 2009 Abstract Ths note provdes an unbased estmator for the absolute S-Gn and an almost unbased estmator for the relatve S-Gn for
More informationMACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression
11 MACHINE APPLIED MACHINE LEARNING LEARNING MACHINE LEARNING Gaussan Mture Regresson 22 MACHINE APPLIED MACHINE LEARNING LEARNING Bref summary of last week s lecture 33 MACHINE APPLIED MACHINE LEARNING
More informationx i1 =1 for all i (the constant ).
Chapter 5 The Multple Regresson Model Consder an economc model where the dependent varable s a functon of K explanatory varables. The economc model has the form: y = f ( x,x,..., ) xk Approxmate ths by
More informationGaussian Mixture Models
Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationModule 2. Random Processes. Version 2 ECE IIT, Kharagpur
Module Random Processes Lesson 6 Functons of Random Varables After readng ths lesson, ou wll learn about cdf of functon of a random varable. Formula for determnng the pdf of a random varable. Let, X be
More informationCSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography
CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve
More informationStatistical pattern recognition
Statstcal pattern recognton Bayes theorem Problem: decdng f a patent has a partcular condton based on a partcular test However, the test s mperfect Someone wth the condton may go undetected (false negatve
More informationThe Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction
ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also
More informationThe Order Relation and Trace Inequalities for. Hermitian Operators
Internatonal Mathematcal Forum, Vol 3, 08, no, 507-57 HIKARI Ltd, wwwm-hkarcom https://doorg/0988/mf088055 The Order Relaton and Trace Inequaltes for Hermtan Operators Y Huang School of Informaton Scence
More informationModule 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:
More informationLINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity
LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have
More informationComposite Hypotheses testing
Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter
More informationComparison of Regression Lines
STATGRAPHICS Rev. 9/13/2013 Comparson of Regresson Lnes Summary... 1 Data Input... 3 Analyss Summary... 4 Plot of Ftted Model... 6 Condtonal Sums of Squares... 6 Analyss Optons... 7 Forecasts... 8 Confdence
More informationDETERMINATION OF UNCERTAINTY ASSOCIATED WITH QUANTIZATION ERRORS USING THE BAYESIAN APPROACH
Proceedngs, XVII IMEKO World Congress, June 7, 3, Dubrovn, Croata Proceedngs, XVII IMEKO World Congress, June 7, 3, Dubrovn, Croata TC XVII IMEKO World Congress Metrology n the 3rd Mllennum June 7, 3,
More informationTransfer Functions. Convenient representation of a linear, dynamic model. A transfer function (TF) relates one input and one output: ( ) system
Transfer Functons Convenent representaton of a lnear, dynamc model. A transfer functon (TF) relates one nput and one output: x t X s y t system Y s The followng termnology s used: x y nput output forcng
More informationwhere the sums are over the partcle labels. In general H = p2 2m + V s(r ) V j = V nt (jr, r j j) (5) where V s s the sngle-partcle potental and V nt
Physcs 543 Quantum Mechancs II Fall 998 Hartree-Fock and the Self-consstent Feld Varatonal Methods In the dscusson of statonary perturbaton theory, I mentoned brey the dea of varatonal approxmaton schemes.
More informationChapter 9: Statistical Inference and the Relationship between Two Variables
Chapter 9: Statstcal Inference and the Relatonshp between Two Varables Key Words The Regresson Model The Sample Regresson Equaton The Pearson Correlaton Coeffcent Learnng Outcomes After studyng ths chapter,
More informationMIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU
Group M D L M Chapter Bayesan Decson heory Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty Bayesan Decson heory Bayesan decson theory s a statstcal approach to data mnng/pattern
More informationEcon107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)
I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes
More informationU-Pb Geochronology Practical: Background
U-Pb Geochronology Practcal: Background Basc Concepts: accuracy: measure of the dfference between an expermental measurement and the true value precson: measure of the reproducblty of the expermental result
More informationExplaining the Stein Paradox
Explanng the Sten Paradox Kwong Hu Yung 1999/06/10 Abstract Ths report offers several ratonale for the Sten paradox. Sectons 1 and defnes the multvarate normal mean estmaton problem and ntroduces Sten
More informationMaximum Likelihood Estimation (MLE)
Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y
More informationChapter 6. Supplemental Text Material
Chapter 6. Supplemental Text Materal S6-. actor Effect Estmates are Least Squares Estmates We have gven heurstc or ntutve explanatons of how the estmates of the factor effects are obtaned n the textboo.
More informationSupporting Information
Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More informationStatistical Inference. 2.3 Summary Statistics Measures of Center and Spread. parameters ( population characteristics )
Ismor Fscher, 8//008 Stat 54 / -8.3 Summary Statstcs Measures of Center and Spread Dstrbuton of dscrete contnuous POPULATION Random Varable, numercal True center =??? True spread =???? parameters ( populaton
More informationModule 3: Element Properties Lecture 1: Natural Coordinates
Module 3: Element Propertes Lecture : Natural Coordnates Natural coordnate system s bascally a local coordnate system whch allows the specfcaton of a pont wthn the element by a set of dmensonless numbers
More informationLogistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton
More informationLOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin
Proceedngs of the 007 Wnter Smulaton Conference S G Henderson, B Bller, M-H Hseh, J Shortle, J D Tew, and R R Barton, eds LOW BIAS INTEGRATED PATH ESTIMATORS James M Calvn Department of Computer Scence
More informationFor now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.
Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson
More informationNon-linear Canonical Correlation Analysis Using a RBF Network
ESANN' proceedngs - European Smposum on Artfcal Neural Networks Bruges (Belgum), 4-6 Aprl, d-sde publ., ISBN -97--, pp. 57-5 Non-lnear Canoncal Correlaton Analss Usng a RBF Network Sukhbnder Kumar, Elane
More informationLinear Feature Engineering 11
Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19
More informationEPR Paradox and the Physical Meaning of an Experiment in Quantum Mechanics. Vesselin C. Noninski
EPR Paradox and the Physcal Meanng of an Experment n Quantum Mechancs Vesseln C Nonnsk vesselnnonnsk@verzonnet Abstract It s shown that there s one purely determnstc outcome when measurement s made on
More informationResearch Article Green s Theorem for Sign Data
Internatonal Scholarly Research Network ISRN Appled Mathematcs Volume 2012, Artcle ID 539359, 10 pages do:10.5402/2012/539359 Research Artcle Green s Theorem for Sgn Data Lous M. Houston The Unversty of
More informationChapter 13: Multiple Regression
Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to
More informationChapter 3 Differentiation and Integration
MEE07 Computer Modelng Technques n Engneerng Chapter Derentaton and Integraton Reerence: An Introducton to Numercal Computatons, nd edton, S. yakowtz and F. zdarovsky, Mawell/Macmllan, 990. Derentaton
More information8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS
SECTION 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS 493 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS All the vector spaces you have studed thus far n the text are real vector spaces because the scalars
More informationFeature Selection: Part 1
CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?
More informationBOUNDEDNESS OF THE RIESZ TRANSFORM WITH MATRIX A 2 WEIGHTS
BOUNDEDNESS OF THE IESZ TANSFOM WITH MATIX A WEIGHTS Introducton Let L = L ( n, be the functon space wth norm (ˆ f L = f(x C dx d < For a d d matrx valued functon W : wth W (x postve sem-defnte for all
More informationNumerical Heat and Mass Transfer
Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and
More informationELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM
ELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM An elastc wave s a deformaton of the body that travels throughout the body n all drectons. We can examne the deformaton over a perod of tme by fxng our look
More informationSupplementary Notes for Chapter 9 Mixture Thermodynamics
Supplementary Notes for Chapter 9 Mxture Thermodynamcs Key ponts Nne major topcs of Chapter 9 are revewed below: 1. Notaton and operatonal equatons for mxtures 2. PVTN EOSs for mxtures 3. General effects
More informationLecture 3: Dual problems and Kernels
Lecture 3: Dual problems and Kernels C4B Machne Learnng Hlary 211 A. Zsserman Prmal and dual forms Lnear separablty revsted Feature mappng Kernels for SVMs Kernel trck requrements radal bass functons SVM
More informationEconomics 130. Lecture 4 Simple Linear Regression Continued
Economcs 130 Lecture 4 Contnued Readngs for Week 4 Text, Chapter and 3. We contnue wth addressng our second ssue + add n how we evaluate these relatonshps: Where do we get data to do ths analyss? How do
More information2 STATISTICALLY OPTIMAL TRAINING DATA 2.1 A CRITERION OF OPTIMALITY We revew the crteron of statstcally optmal tranng data (Fukumzu et al., 1994). We
Advances n Neural Informaton Processng Systems 8 Actve Learnng n Multlayer Perceptrons Kenj Fukumzu Informaton and Communcaton R&D Center, Rcoh Co., Ltd. 3-2-3, Shn-yokohama, Yokohama, 222 Japan E-mal:
More information2016 Wiley. Study Session 2: Ethical and Professional Standards Application
6 Wley Study Sesson : Ethcal and Professonal Standards Applcaton LESSON : CORRECTION ANALYSIS Readng 9: Correlaton and Regresson LOS 9a: Calculate and nterpret a sample covarance and a sample correlaton
More informationWhy feed-forward networks are in a bad shape
Why feed-forward networks are n a bad shape Patrck van der Smagt, Gerd Hrznger Insttute of Robotcs and System Dynamcs German Aerospace Center (DLR Oberpfaffenhofen) 82230 Wesslng, GERMANY emal smagt@dlr.de
More informationMidterm Examination. Regression and Forecasting Models
IOMS Department Regresson and Forecastng Models Professor Wllam Greene Phone: 22.998.0876 Offce: KMC 7-90 Home page: people.stern.nyu.edu/wgreene Emal: wgreene@stern.nyu.edu Course web page: people.stern.nyu.edu/wgreene/regresson/outlne.htm
More information1 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations
Physcs 171/271 -Davd Klenfeld - Fall 2005 (revsed Wnter 2011) 1 Dervaton of Rate Equatons from Sngle-Cell Conductance (Hodgkn-Huxley-lke) Equatons We consder a network of many neurons, each of whch obeys
More informationChapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems
Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons
More information2.3 Nilpotent endomorphisms
s a block dagonal matrx, wth A Mat dm U (C) In fact, we can assume that B = B 1 B k, wth B an ordered bass of U, and that A = [f U ] B, where f U : U U s the restrcton of f to U 40 23 Nlpotent endomorphsms
More informationThe conjugate prior to a Bernoulli is. A) Bernoulli B) Gaussian C) Beta D) none of the above
The conjugate pror to a Bernoull s A) Bernoull B) Gaussan C) Beta D) none of the above The conjugate pror to a Gaussan s A) Bernoull B) Gaussan C) Beta D) none of the above MAP estmates A) argmax θ p(θ
More informationDr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur
Analyss of Varance and Desgn of Experment-I MODULE VII LECTURE - 3 ANALYSIS OF COVARIANCE Dr Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur Any scentfc experment s performed
More informationStructure and Drive Paul A. Jensen Copyright July 20, 2003
Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.
More informationThe topics in this section concern with the second course objective. Correlation is a linear relation between two random variables.
4.1 Correlaton The topcs n ths secton concern wth the second course objectve. Correlaton s a lnear relaton between two random varables. Note that the term relaton used n ths secton means connecton or relatonshp
More informationLecture 12: Discrete Laplacian
Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly
More informationLecture 10 Support Vector Machines. Oct
Lecture 10 Support Vector Machnes Oct - 20-2008 Lnear Separators Whch of the lnear separators s optmal? Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron
More informationKernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan
Kernels n Support Vector Machnes Based on lectures of Martn Law, Unversty of Mchgan Non Lnear separable problems AND OR NOT() The XOR problem cannot be solved wth a perceptron. XOR Per Lug Martell - Systems
More informationFeb 14: Spatial analysis of data fields
Feb 4: Spatal analyss of data felds Mappng rregularly sampled data onto a regular grd Many analyss technques for geophyscal data requre the data be located at regular ntervals n space and/or tme. hs s
More informationx = , so that calculated
Stat 4, secton Sngle Factor ANOVA notes by Tm Plachowsk n chapter 8 we conducted hypothess tests n whch we compared a sngle sample s mean or proporton to some hypotheszed value Chapter 9 expanded ths to
More informationDUE: WEDS FEB 21ST 2018
HOMEWORK # 1: FINITE DIFFERENCES IN ONE DIMENSION DUE: WEDS FEB 21ST 2018 1. Theory Beam bendng s a classcal engneerng analyss. The tradtonal soluton technque makes smplfyng assumptons such as a constant
More informationMore metrics on cartesian products
More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of
More informationU.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016
U.C. Berkeley CS94: Spectral Methods and Expanders Handout 8 Luca Trevsan February 7, 06 Lecture 8: Spectral Algorthms Wrap-up In whch we talk about even more generalzatons of Cheeger s nequaltes, and
More informationCHAPTER III Neural Networks as Associative Memory
CHAPTER III Neural Networs as Assocatve Memory Introducton One of the prmary functons of the bran s assocatve memory. We assocate the faces wth names, letters wth sounds, or we can recognze the people
More information