A Simple Weight Decay Can Improve. Abstract. It has been observed in numerical simulations that a weight decay can improve
|
|
- Denis Weaver
- 5 years ago
- Views:
Transcription
1 In Advances in Neural Information Processing Systems 4, J.E. Moody, S.J. Hanson and R.P. Limann, eds. Morgan Kaumann Publishers, San Mateo CA, 1995,. 950{957. A Simle Weight Decay Can Imrove Generalization Anders Krogh connect, The Niels Bohr Institute Blegdamsvej 17 DK-2100 Coenhagen, Denmark krogh@cse.ucsc.edu John A. Hertz Nordita Blegdamsvej 17 DK-2100 Coenhagen, Denmark hertz@nordita.dk Abstract It has been observed in numerical simulations that a weight decay can imrove generalization in a feed-forward neural network. This aer exlains why. It is roven that a weight decay has two eects in a linear network. First, it suresses any irrelevant comonents of the weight vector by choosing the smallest vector that solves the learning roblem. Second, if the size is chosen right, a weight decay can suress some of the eects of static noise on the targets, which imroves generalization quite a lot. It is then shown how to extend these results to networks with hidden layers and non-linear units. Finally the theory is conrmed by some numerical simulations using the data from NetTalk. 1 INTRODUCTION Many recent studies have shown that the generalization ability of a neural network (or any other `learning machine') deends on a balance between the information in the training examles and the comlexity of the network, see for instance [1, 2, 3]. Bad generalization occurs if the information does not match the comlexity, e.g. if the network is very comlex and there is little information in the training set. In this last instance the network will be over-tting the data, and the oosite situation corresonds to under-tting. Present address: Comuter and Information Sciences, Univ. of California Santa Cruz, Santa Cruz, CA
2 Often the number of free arameters, i.e. the number of weights and thresholds, is used as a measure of the network comlexity, and algorithms have been develoed, which minimizes the number of weights while still keeing the error on the training examles small [4, 5, 6]. This minimization of the number of free arameters is not always what is needed. A dierent way to constrain a network, and thus decrease its comlexity, is to limit the growth of the weights through some kind of weight decay. It should revent the weights from growing too large unless it is really necessary. It can be realized by adding a term to the cost function that enalizes large weights, E(w) = E 0 (w) + 1 w 2 i ; (1) 2 i where E 0 is one's favorite error measure (usually the sum of squared errors), and is a arameter governing how strongly large weights are enalized. w is a vector containing all free arameters of the network, it will be called the weight vector. If gradient descend is used for learning, the last term in the cost function leads to a new term?w i in the weight udate: _w i i? w i : (2) Here it is formulated in continuous time. If the gradient of E 0 (the `force term') were not resent this equation would lead to an exonential decay of the weights. Obviously there are innitely many ossibilities for choosing other forms of the additional term in (1), but here we will concentrate on this simle form. It has been known for a long time that a weight decay of this form can imrove generalization [7], but until now not very widely recognized. The aim of this aer is to analyze this eect both theoretically and exerimentally. Weight decay as a secial kind of regularization is also discussed in [8, 9]. 2 FEED-FORWARD NETWORKS A feed-forward neural network imlements a function of the inuts that deends on the weight vector w, it is called f w. For simlicity it is assumed that there is only one outut unit. When the inut is the outut is f w (). Note that the inut vector is a vector in the N-dimensional inut sace, whereas the weight vector is a vector in the weight sace which has a dierent dimension W. The aim of the learning is not only to learn the examles, but to learn the underlying function that roduces the targets for the learning rocess. First, we assume that this target function can actually be imlemented by the network. This means there exists a weight vector u such that the target function is equal to f u. The network with arameters u is often called the teacher, because from inut vectors it can roduce the right targets. The sum of squared errors is E 0 (w) = 1 2 =1 [f u ( )? f w ( )] 2 ; (3) 2
3 where is the number of training atterns. The learning equation (2) can then be written _w i / [f u ( )? f w ( w()? w i : i Now the idea is to exand this around the solution u, but rst the linear case will be analyzed in some detail. 3 THE LINEAR PERCEPTRON The simlest kind of `network' is the linear ercetron characterized by f w () = N?1=2 w i i (5) where the N?1=2 is just a convenient normalization factor. Here the dimension of the weight sace (W ) is the same as the dimension of the inut sace (N). The learning equation then takes the simle form _w i / N?1 [u j? w j ] j? w i i: (6) Dening and it becomes j i v i u i? w i (7) A ij = N?1 i j (8) _v i /? A ij v j + (u i? v i ): (9) Transforming this equation to the basis where A is diagonal yields j _v r /?(A r + )v r + u r ; (10) where A r are the eigenvalues of A, and a subscrit r indicates transformation to this basis. The generalization error is dened as the error averaged over the distribution of inut vectors F = h[f u ()? f w ()] 2 i = hn?1 ( v i i ) 2 i = N?1 v i v j h i j i i ij = N?1 v 2 i : (11) i Here it is assumed that h i j i = ij. The generalization error F is thus roortional to jvj 2, which is also quite natural. The eigenvalues of the covariance matrix A are non-negative, and its rank can easily be shown to be less than or equal to. It is also easily seen that all eigenvectors belonging to eigenvalues larger than 0 lies in the subsace of weight sace sanned 3
4 by the inut atterns 1 ; : : :;. This subsace, called the attern subsace, will be denoted V, and the orthogonal subsace is denoted by V?. When there are suciently many examles they san the whole sace, and there will be no zero eigenvalues. This can only haen for N. When = 0 the solution to (10) inside V is just a simle exonential decay to v r = 0. Outside the attern subsace A r = 0, and the corresonding art of v r will be constant. Any weight vector which has the same rojection onto the attern subsace as u gives a learning error 0. One can think of this as a `valley' in the error surface given by u + V?. The training set contains no information that can hel us choose between all these solutions to the learning roblem. When learning with a weight decay > 0, the constant art in V? will decay to zero asymtotically (as e?t, where t is the time). An innitesimal weight decay will therefore choose the solution with the smallest norm out of all the solutions in the valley described above. This solution can be shown to be the otimal one on average. 4 LEARNING WITH AN UNRELIABLE TEACHER Random errors made by the teacher can be modeled by adding a random term to the targets: f u ( )?! f u ( ) + : (12) The variance of is called 2, and it is assumed to have zero mean. Note that these targets are not exactly realizable by the network (for > 0), and therefore this is a simle model for studying learning of an unrealizable function. With this noise the learning equation (2) becomes _w i / (N?1 v j + N?1=2 )? w j i i: (13) j Transforming it to the basis where A is diagonal as before, _v r /?(A r + )v r + u r? N?1=2 r : (14) The asymtotic solution to this equation is v r = u r? N?1=2 P r + A r : (15) The contribution to the generalization error is the square of this summed over all r. If averaged over the noise (shown by the bar) it becomes for each r P F r = v 2 = 2 u 2 r + N?1 ( r ) 2 ( ) 2 r = 2 u 2 + A r r 2 ( + A r ) 2 ( + A r ) : (16) 2 The last exression has a minimum in, which can be found by utting the derivative with resect to equal to zero, r = otimal 2 =u 2 r. Remarkably it deends only 4
5 Figure 1: Generalization error as a function of = =N. The full line is for = 2 = 0:2, and the dashed line for = 0. The dotted line is the generalization error with no noise and = 0. 1 F /N on u and the variance of the noise, and not on A. If it is assumed that u is random (16) can be averaged over u. This yields an otimal indeendent of r, where u 2 is the average of N?1 juj 2. otimal = 2 u 2 ; (17) In this case the weight decay to some extent revents the network from tting the noise. From equation (14) one can see that the noise is rojected onto the attern subsace. Therefore the contribution to the generalization error from V? is the same as before, and this contribution is on average minimized by a weight decay of any size. Equation (17) was derived in [10] in the context of a articular eigenvalue sectrum. Figure g. 1 shows the dramatic imrovement in generalization error when the otimal weight decay is used in this case. The resent treatment shows that (17) is indeendent of the sectrum of A. We conclude that a weight decay has two ositive eects on generalization in a linear network: 1) It suresses any irrelevant comonents of the weight vector by choosing the smallest vector that solves the learning roblem. 2) If the size is chosen right, it can suress some of the eect of static noise on the targets. 5 NON-LINEAR NETWORKS It is not ossible to analyze a general non-linear network exactly, as done above for the linear case. By a local linearization, it is however, ossible to draw some interesting conclusions from the results in the revious section. Assume the function is realizable, f = f u. Then learning corresonds to solving the equations f w ( ) = f u ( ) (18) 5
6 in W variables, where W is the number of weights. For < W these equations dene a manifold in weight sace of dimension at least W?. Any oint ~w on this manifold gives a learning error of zero, and therefore (4) can be exanded around ~w. Putting v = ~w? w, exanding f w in v, and using it in (4) ( w ( ) _v i /? v j + ( ~w i? v i i ;j =? A ij ( ~w)v j? v i + ~w i (19) j (The derivatives in this equation should be taken at ~w.) The analogue of A is dened w ( w ( ) A ij ( ~w) : j Since it is of outer roduct form (like A) its rank R( ~w) minf; W g. Thus when < W, A is never of full rank. The rank of A is of course equal to W minus the dimension of the manifold mentioned above. From these simle observations one can argue that good generalization should not be exected for < W. This is in accordance with other results (cf. [3]), and with current `folk-lore'. The dierence from the linear case is that the `rain gutter' need not be (and most robably is not) linear, but curved in this case. There may in fact be other valleys or rain gutters disconnected from the one containing u. One can also see that if A has full rank, all oints in the immediate neighborhood of ~w = u give a learning error larger than 0, i.e. there is a simle minimum at u. Assume that the learning nds one of these valleys. A small weight decay will ick out the oint in the valley with the smallest norm among all the oints in the valley. In general it can not be roven that icking that solution is the best strategy. But, at least from a hilosohical oint of view, it seems sensible, because it is (in a loose sense) the solution with the smallest comlexity the one that Ockham would robably have chosen. The value of a weight decay is more evident if there are small errors in the targets. In that case one can go through exactly the same line of arguments as for the linear case to show that a weight decay can imrove generalization, and even with the same otimal choice (17) of. This is strictly true only for small errors (where the linear aroximation is valid). 6 NUMERICAL EPERIMENTS A weight decay has been tested on the NetTalk roblem [11]. In the simulations back-roagation derived from the `entroic error measure' [12] with a momentum term xed at 0.8 was used. The network had 726 inut units, 40 hidden units and 26 outut units. In all about 8400 weights. It was trained on 400 to 5000 random words from the data base of around words, and tested on a dierent set of 1000 random words. The training set and test set were indeendent from run to run. 6
7 Errors F Figure 2: The to full line corresonds to the generalization error after 300 eochs (300 cycles through the training set) without a weight decay. The lower full line is with a weight decay. The to dotted line is the lowest error seen during learning without a weight decay, and the lower dotted with a weight decay. The size of the weight decay was = 0: Insert: Same gure excet that the error rate is shown instead of the squared error. The error rate is the fraction of wrong honemes when the honeme vector with the smallest angle to the actual outut is chosen, see [11]. Results are shown in g. 2. There is a clear imrovement in generalization error when weight decay is used. There is also an imrovement in error rate (insert of g. 2), but it is less ronounced in terms of relative imrovement. Results shown here are for a weight decay of = 0: The values and was also tried and gave basically the same curves. 7 CONCLUSION It was shown how a weight decay can imrove generalization in two ways: 1) It suresses any irrelevant comonents of the weight vector by choosing the smallest vector that solves the learning roblem. 2) If the size is chosen right, a weight decay can suress some of the eect of static noise on the targets. Static noise on the targets can be viewed as a model of learning an unrealizable function. The analysis assumed that the network could be exanded around an otimal weight vector, and 7
8 therefore it is strictly only valid in a little neighborhood around that vector. The imrovement from a weight decay was also tested by simulations. For the NetTalk data it was shown that a weight decay can decrease the generalization error (squared error) and also, although less signicantly, the actual mistake rate of the network when the honeme closest to the outut is chosen. Acknowledgements AK acknowledges suort from the Danish Natural Science Council and the Danish Technical Research Council through the Comutational Neural Network Center (connect). References [1] D.B. Schwartz, V.K. Samalam, S.A. Solla, and J.S. Denker. Exhaustive learning. Neural Comutation, 2:371{382, [2] N. Tishby, E. Levin, and S.A. Solla. Consistent inference of robabilities in layered networks: redictions and generalization. In International Joint Conference on Neural Networks, ages 403{410, (Washington 1989), IEEE, New York, [3] E.B. Baum and D. Haussler. What size net gives valid generalization? Neural Comutation, 1:151{160, [4] Y. Le Cun, J.S. Denker, and S.A. Solla. Otimal brain damage. In D.S. Touretzky, editor, Advances in Neural Information Processing Systems, ages 598{ 605, (Denver 1989), Morgan Kaufmann, San Mateo, [5] H.H. Thodberg. Imroving generalization of neural networks through runing. International Journal of Neural Systems, 1:317{326, [6] D.H. Weigend, D.E. Rumelhart, and B.A. Huberman. Generalization by weight-elimination with alication to forecasting. In R.P. Limann et al, editors, Advances in Neural Information Processing Systems, age 875{882, (Denver 1989), Morgan Kaufmann, San Mateo, [7] G.E. Hinton. Learning translation invariant recognition in a massively arallel network. In G. Goos and J. Hartmanis, editors, PARLE: Parallel Architectures and Languages Euroe. Lecture Notes in Comuter Science, ages 1{13, Sringer-Verlag, Berlin, [8] J.Moody. Generalization, weight decay, and architecture selection for nonlinear learning systems. These roceedings. [9] D. MacKay. A ractical bayesian framework for backro networks. These roceedings. [10] A. Krogh and J.A. Hertz. Generalization in a Linear Percetron in the Presence of Noise. To aear in Journal of Physics A [11] T.J. Sejnowski and C.R. Rosenberg. Parallel networks that learn to ronounce english text. Comlex Systems, 1:145{168, [12] J.A. Hertz, A. Krogh, and R.G. Palmer. Introduction to the Theory of Neural Comutation. Addison-Wesley, Redwood City,
A Simple Weight Decay Can Improve Generalization
A Simple Weight Decay Can Improve Generalization Anders Krogh CONNECT, The Niels Bohr Institute Blegdamsvej 17 DK-2100 Copenhagen, Denmark krogh@cse.ucsc.edu John A. Hertz Nordita Blegdamsvej 17 DK-2100
More informationPrincipal Components Analysis and Unsupervised Hebbian Learning
Princial Comonents Analysis and Unsuervised Hebbian Learning Robert Jacobs Deartment of Brain & Cognitive Sciences University of Rochester Rochester, NY 1467, USA August 8, 008 Reference: Much of the material
More informationFeedback-error control
Chater 4 Feedback-error control 4.1 Introduction This chater exlains the feedback-error (FBE) control scheme originally described by Kawato [, 87, 8]. FBE is a widely used neural network based controller
More informationRadial Basis Function Networks: Algorithms
Radial Basis Function Networks: Algorithms Introduction to Neural Networks : Lecture 13 John A. Bullinaria, 2004 1. The RBF Maing 2. The RBF Network Architecture 3. Comutational Power of RBF Networks 4.
More informationSolved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points.
Solved Problems Solved Problems P Solve the three simle classification roblems shown in Figure P by drawing a decision boundary Find weight and bias values that result in single-neuron ercetrons with the
More informationTowards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK
Towards understanding the Lorenz curve using the Uniform distribution Chris J. Stehens Newcastle City Council, Newcastle uon Tyne, UK (For the Gini-Lorenz Conference, University of Siena, Italy, May 2005)
More informationAI*IA 2003 Fusion of Multiple Pattern Classifiers PART III
AI*IA 23 Fusion of Multile Pattern Classifiers PART III AI*IA 23 Tutorial on Fusion of Multile Pattern Classifiers by F. Roli 49 Methods for fusing multile classifiers Methods for fusing multile classifiers
More informationRecent Developments in Multilayer Perceptron Neural Networks
Recent Develoments in Multilayer Percetron eural etworks Walter H. Delashmit Lockheed Martin Missiles and Fire Control Dallas, Texas 75265 walter.delashmit@lmco.com walter.delashmit@verizon.net Michael
More informationarxiv:cond-mat/ v2 25 Sep 2002
Energy fluctuations at the multicritical oint in two-dimensional sin glasses arxiv:cond-mat/0207694 v2 25 Se 2002 1. Introduction Hidetoshi Nishimori, Cyril Falvo and Yukiyasu Ozeki Deartment of Physics,
More information216 S. Chandrasearan and I.C.F. Isen Our results dier from those of Sun [14] in two asects: we assume that comuted eigenvalues or singular values are
Numer. Math. 68: 215{223 (1994) Numerische Mathemati c Sringer-Verlag 1994 Electronic Edition Bacward errors for eigenvalue and singular value decomositions? S. Chandrasearan??, I.C.F. Isen??? Deartment
More informationLIMITATIONS OF RECEPTRON. XOR Problem The failure of the perceptron to successfully simple problem such as XOR (Minsky and Papert).
LIMITATIONS OF RECEPTRON XOR Problem The failure of the ercetron to successfully simle roblem such as XOR (Minsky and Paert). x y z x y z 0 0 0 0 0 0 Fig. 4. The exclusive-or logic symbol and function
More informationMultilayer Perceptron Neural Network (MLPs) For Analyzing the Properties of Jordan Oil Shale
World Alied Sciences Journal 5 (5): 546-552, 2008 ISSN 1818-4952 IDOSI Publications, 2008 Multilayer Percetron Neural Network (MLPs) For Analyzing the Proerties of Jordan Oil Shale 1 Jamal M. Nazzal, 2
More informationA Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split
A Bound on the Error of Cross Validation Using the Aroximation and Estimation Rates, with Consequences for the Training-Test Slit Michael Kearns AT&T Bell Laboratories Murray Hill, NJ 7974 mkearns@research.att.com
More informationPlotting the Wilson distribution
, Survey of English Usage, University College London Setember 018 1 1. Introduction We have discussed the Wilson score interval at length elsewhere (Wallis 013a, b). Given an observed Binomial roortion
More informationState Estimation with ARMarkov Models
Deartment of Mechanical and Aerosace Engineering Technical Reort No. 3046, October 1998. Princeton University, Princeton, NJ. State Estimation with ARMarkov Models Ryoung K. Lim 1 Columbia University,
More informationApproximating min-max k-clustering
Aroximating min-max k-clustering Asaf Levin July 24, 2007 Abstract We consider the roblems of set artitioning into k clusters with minimum total cost and minimum of the maximum cost of a cluster. The cost
More informationUncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning
TNN-2009-P-1186.R2 1 Uncorrelated Multilinear Princial Comonent Analysis for Unsuervised Multilinear Subsace Learning Haiing Lu, K. N. Plataniotis and A. N. Venetsanooulos The Edward S. Rogers Sr. Deartment
More informationECE 534 Information Theory - Midterm 2
ECE 534 Information Theory - Midterm Nov.4, 009. 3:30-4:45 in LH03. You will be given the full class time: 75 minutes. Use it wisely! Many of the roblems have short answers; try to find shortcuts. You
More informationProbability Estimates for Multi-class Classification by Pairwise Coupling
Probability Estimates for Multi-class Classification by Pairwise Couling Ting-Fan Wu Chih-Jen Lin Deartment of Comuter Science National Taiwan University Taiei 06, Taiwan Ruby C. Weng Deartment of Statistics
More informationGenetic Algorithms, Selection Schemes, and the Varying Eects of Noise. IlliGAL Report No November Department of General Engineering
Genetic Algorithms, Selection Schemes, and the Varying Eects of Noise Brad L. Miller Det. of Comuter Science University of Illinois at Urbana-Chamaign David E. Goldberg Det. of General Engineering University
More informationFuzzy Automata Induction using Construction Method
Journal of Mathematics and Statistics 2 (2): 395-4, 26 ISSN 1549-3644 26 Science Publications Fuzzy Automata Induction using Construction Method 1,2 Mo Zhi Wen and 2 Wan Min 1 Center of Intelligent Control
More informationAn Analysis of Reliable Classifiers through ROC Isometrics
An Analysis of Reliable Classifiers through ROC Isometrics Stijn Vanderlooy s.vanderlooy@cs.unimaas.nl Ida G. Srinkhuizen-Kuyer kuyer@cs.unimaas.nl Evgueni N. Smirnov smirnov@cs.unimaas.nl MICC-IKAT, Universiteit
More informationLearning with Ensembles: How. over-tting can be useful. Anders Krogh Copenhagen, Denmark. Abstract
Published in: Advances in Neural Information Processing Systems 8, D S Touretzky, M C Mozer, and M E Hasselmo (eds.), MIT Press, Cambridge, MA, pages 190-196, 1996. Learning with Ensembles: How over-tting
More informationIntroduction Consider a set of jobs that are created in an on-line fashion and should be assigned to disks. Each job has a weight which is the frequen
Ancient and new algorithms for load balancing in the L norm Adi Avidor Yossi Azar y Jir Sgall z July 7, 997 Abstract We consider the on-line load balancing roblem where there are m identical machines (servers)
More informationGeneral Linear Model Introduction, Classes of Linear models and Estimation
Stat 740 General Linear Model Introduction, Classes of Linear models and Estimation An aim of scientific enquiry: To describe or to discover relationshis among events (variables) in the controlled (laboratory)
More information15-451/651: Design & Analysis of Algorithms October 23, 2018 Lecture #17: Prediction from Expert Advice last changed: October 25, 2018
5-45/65: Design & Analysis of Algorithms October 23, 208 Lecture #7: Prediction from Exert Advice last changed: October 25, 208 Prediction with Exert Advice Today we ll study the roblem of making redictions
More informationAR PROCESSES AND SOURCES CAN BE RECONSTRUCTED FROM. Radu Balan, Alexander Jourjine, Justinian Rosca. Siemens Corporation Research
AR PROCESSES AND SOURCES CAN BE RECONSTRUCTED FROM DEGENERATE MIXTURES Radu Balan, Alexander Jourjine, Justinian Rosca Siemens Cororation Research 7 College Road East Princeton, NJ 8 fradu,jourjine,roscag@scr.siemens.com
More informationUniform Law on the Unit Sphere of a Banach Space
Uniform Law on the Unit Shere of a Banach Sace by Bernard Beauzamy Société de Calcul Mathématique SA Faubourg Saint Honoré 75008 Paris France Setember 008 Abstract We investigate the construction of a
More informationFor q 0; 1; : : : ; `? 1, we have m 0; 1; : : : ; q? 1. The set fh j(x) : j 0; 1; ; : : : ; `? 1g forms a basis for the tness functions dened on the i
Comuting with Haar Functions Sami Khuri Deartment of Mathematics and Comuter Science San Jose State University One Washington Square San Jose, CA 9519-0103, USA khuri@juiter.sjsu.edu Fax: (40)94-500 Keywords:
More informationM. Opper y. Institut fur Theoretische Physik. Justus-Liebig-Universitat Giessen. D-6300 Giessen, Germany. physik.uni-giessen.dbp.
Query by Committee H. S. Seung Racah Institute of Physics and Center for eural Comutation Hebrew University Jerusalem 9194, Israel seung@mars.huji.ac.il M. Oer y Institut fur Theoretische Physik Justus-Liebig-Universitat
More information4. Score normalization technical details We now discuss the technical details of the score normalization method.
SMT SCORING SYSTEM This document describes the scoring system for the Stanford Math Tournament We begin by giving an overview of the changes to scoring and a non-technical descrition of the scoring rules
More informationCorrespondence Between Fractal-Wavelet. Transforms and Iterated Function Systems. With Grey Level Maps. F. Mendivil and E.R.
1 Corresondence Between Fractal-Wavelet Transforms and Iterated Function Systems With Grey Level Mas F. Mendivil and E.R. Vrscay Deartment of Alied Mathematics Faculty of Mathematics University of Waterloo
More informationAnalysis of Group Coding of Multiple Amino Acids in Artificial Neural Network Applied to the Prediction of Protein Secondary Structure
Analysis of Grou Coding of Multile Amino Acids in Artificial Neural Networ Alied to the Prediction of Protein Secondary Structure Zhu Hong-ie 1, Dai Bin 2, Zhang Ya-feng 1, Bao Jia-li 3,* 1 College of
More informationLinear diophantine equations for discrete tomography
Journal of X-Ray Science and Technology 10 001 59 66 59 IOS Press Linear diohantine euations for discrete tomograhy Yangbo Ye a,gewang b and Jiehua Zhu a a Deartment of Mathematics, The University of Iowa,
More informationGeneralized Coiflets: A New Family of Orthonormal Wavelets
Generalized Coiflets A New Family of Orthonormal Wavelets Dong Wei, Alan C Bovik, and Brian L Evans Laboratory for Image and Video Engineering Deartment of Electrical and Comuter Engineering The University
More informationSAS for Bayesian Mediation Analysis
Paer 1569-2014 SAS for Bayesian Mediation Analysis Miočević Milica, Arizona State University; David P. MacKinnon, Arizona State University ABSTRACT Recent statistical mediation analysis research focuses
More informationCombining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)
Combining Logistic Regression with Kriging for Maing the Risk of Occurrence of Unexloded Ordnance (UXO) H. Saito (), P. Goovaerts (), S. A. McKenna (2) Environmental and Water Resources Engineering, Deartment
More information1-way quantum finite automata: strengths, weaknesses and generalizations
1-way quantum finite automata: strengths, weaknesses and generalizations arxiv:quant-h/9802062v3 30 Se 1998 Andris Ambainis UC Berkeley Abstract Rūsiņš Freivalds University of Latvia We study 1-way quantum
More informationEPJ manuscript No. (will be inserted by the editor) Convective and Absolute Instabilities in the Subcritical Ginzburg-Landau Equation Pere Colet, Dani
EPJ manuscrit No. (will be inserted by the editor) Convective and Absolute Instabilities in the Subcritical Ginzburg-Landau Equation Pere Colet, Daniel Walgraef a, and Maxi San Miguel Instituto Mediterraneo
More informationOn split sample and randomized confidence intervals for binomial proportions
On slit samle and randomized confidence intervals for binomial roortions Måns Thulin Deartment of Mathematics, Usala University arxiv:1402.6536v1 [stat.me] 26 Feb 2014 Abstract Slit samle methods have
More informationSystem Reliability Estimation and Confidence Regions from Subsystem and Full System Tests
009 American Control Conference Hyatt Regency Riverfront, St. Louis, MO, USA June 0-, 009 FrB4. System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests James C. Sall Abstract
More informationq-ary Symmetric Channel for Large q
List-Message Passing Achieves Caacity on the q-ary Symmetric Channel for Large q Fan Zhang and Henry D Pfister Deartment of Electrical and Comuter Engineering, Texas A&M University {fanzhang,hfister}@tamuedu
More informationResearch of power plant parameter based on the Principal Component Analysis method
Research of ower lant arameter based on the Princial Comonent Analysis method Yang Yang *a, Di Zhang b a b School of Engineering, Bohai University, Liaoning Jinzhou, 3; Liaoning Datang international Jinzhou
More informationCMSC 425: Lecture 4 Geometry and Geometric Programming
CMSC 425: Lecture 4 Geometry and Geometric Programming Geometry for Game Programming and Grahics: For the next few lectures, we will discuss some of the basic elements of geometry. There are many areas
More informationNeural network models for river flow forecasting
Neural network models for river flow forecasting Nguyen Tan Danh Huynh Ngoc Phien* and Ashim Das Guta Asian Institute of Technology PO Box 4 KlongLuang 220 Pathumthani Thailand Abstract In this study back
More informationEvaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models
Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models Ketan N. Patel, Igor L. Markov and John P. Hayes University of Michigan, Ann Arbor 48109-2122 {knatel,imarkov,jhayes}@eecs.umich.edu
More informationECON Answers Homework #2
ECON 33 - Answers Homework #2 Exercise : Denote by x the number of containers of tye H roduced, y the number of containers of tye T and z the number of containers of tye I. There are 3 inut equations that
More informationNumerical Linear Algebra
Numerical Linear Algebra Numerous alications in statistics, articularly in the fitting of linear models. Notation and conventions: Elements of a matrix A are denoted by a ij, where i indexes the rows and
More informationUncorrelated Multilinear Discriminant Analysis with Regularization and Aggregation for Tensor Object Recognition
TNN-2007-P-0332.R1 1 Uncorrelated Multilinear Discriminant Analysis with Regularization and Aggregation for Tensor Object Recognition Haiing Lu, K.N. Plataniotis and A.N. Venetsanooulos The Edward S. Rogers
More informationLending Direction to Neural Networks. Richard S. Zemel North Torrey Pines Rd. La Jolla, CA Christopher K. I.
Lending Direction to Neural Networks Richard S Zemel CNL, The Salk Institute 10010 North Torrey Pines Rd La Jolla, CA 92037 Christoher K I Williams Deartment of Comuter Science University of Toronto Toronto,
More information1 Gambler s Ruin Problem
Coyright c 2017 by Karl Sigman 1 Gambler s Ruin Problem Let N 2 be an integer and let 1 i N 1. Consider a gambler who starts with an initial fortune of $i and then on each successive gamble either wins
More informationCOMMUNICATION BETWEEN SHAREHOLDERS 1
COMMUNICATION BTWN SHARHOLDRS 1 A B. O A : A D Lemma B.1. U to µ Z r 2 σ2 Z + σ2 X 2r ω 2 an additive constant that does not deend on a or θ, the agents ayoffs can be written as: 2r rθa ω2 + θ µ Y rcov
More informationAbstract. We perform a perturbative QCD analysis of the quark transverse momentum
BIHEP-TH-96-09 The erturbative ion-hoton transition form factors with transverse momentum corrections Fu-Guang Cao, Tao Huang, and Bo-Qiang Ma CCAST (World Laboratory), P.O.Box 8730, Beijing 100080, China
More informationSolutions to Problem Set 5
Solutions to Problem Set Problem 4.6. f () ( )( 4) For this simle rational function, we see immediately that the function has simle oles at the two indicated oints, and so we use equation (6.) to nd the
More informationA New Perspective on Learning Linear Separators with Large L q L p Margins
A New Persective on Learning Linear Searators with Large L q L Margins Maria-Florina Balcan Georgia Institute of Technology Christoher Berlind Georgia Institute of Technology Abstract We give theoretical
More informationx(n) x(n) H (f) y(n) Z -1 Z -1 x(n) ^
SOE FUNDAENTAL POPETIES OF SE FILTE BANKS. Kvanc hcak, Pierre oulin, Kannan amchandran University of Illinois at Urbana-Chamaign Beckman Institute and ECE Deartment 405 N. athews Ave., Urbana, IL 680 Email:
More informationScaling Multiple Point Statistics for Non-Stationary Geostatistical Modeling
Scaling Multile Point Statistics or Non-Stationary Geostatistical Modeling Julián M. Ortiz, Steven Lyster and Clayton V. Deutsch Centre or Comutational Geostatistics Deartment o Civil & Environmental Engineering
More informationPrinciples of Computed Tomography (CT)
Page 298 Princiles of Comuted Tomograhy (CT) The theoretical foundation of CT dates back to Johann Radon, a mathematician from Vienna who derived a method in 1907 for rojecting a 2-D object along arallel
More informationUse of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek
Use of Transformations and the Reeated Statement in PROC GLM in SAS Ed Stanek Introduction We describe how the Reeated Statement in PROC GLM in SAS transforms the data to rovide tests of hyotheses of interest.
More informationBest approximation by linear combinations of characteristic functions of half-spaces
Best aroximation by linear combinations of characteristic functions of half-saces Paul C. Kainen Deartment of Mathematics Georgetown University Washington, D.C. 20057-1233, USA Věra Kůrková Institute of
More informationPositive decomposition of transfer functions with multiple poles
Positive decomosition of transfer functions with multile oles Béla Nagy 1, Máté Matolcsi 2, and Márta Szilvási 1 Deartment of Analysis, Technical University of Budaest (BME), H-1111, Budaest, Egry J. u.
More informationAn Introduction to Information Theory: Notes
An Introduction to Information Theory: Notes Jon Shlens jonshlens@ucsd.edu 03 February 003 Preliminaries. Goals. Define basic set-u of information theory. Derive why entroy is the measure of information
More informationEstimating Time-Series Models
Estimating ime-series Models he Box-Jenkins methodology for tting a model to a scalar time series fx t g consists of ve stes:. Decide on the order of di erencing d that is needed to roduce a stationary
More informationCSC165H, Mathematical expression and reasoning for computer science week 12
CSC165H, Mathematical exression and reasoning for comuter science week 1 nd December 005 Gary Baumgartner and Danny Hea hea@cs.toronto.edu SF4306A 416-978-5899 htt//www.cs.toronto.edu/~hea/165/s005/index.shtml
More information2 K. ENTACHER 2 Generalized Haar function systems In the following we x an arbitrary integer base b 2. For the notations and denitions of generalized
BIT 38 :2 (998), 283{292. QUASI-MONTE CARLO METHODS FOR NUMERICAL INTEGRATION OF MULTIVARIATE HAAR SERIES II KARL ENTACHER y Deartment of Mathematics, University of Salzburg, Hellbrunnerstr. 34 A-52 Salzburg,
More informationbelow, kernel PCA Eigenvectors, and linear combinations thereof. For the cases where the pre-image does exist, we can provide a means of constructing
Kernel PCA Pattern Reconstruction via Approximate Pre-Images Bernhard Scholkopf, Sebastian Mika, Alex Smola, Gunnar Ratsch, & Klaus-Robert Muller GMD FIRST, Rudower Chaussee 5, 12489 Berlin, Germany fbs,
More informationWolfgang POESSNECKER and Ulrich GROSS*
Proceedings of the Asian Thermohysical Proerties onference -4 August, 007, Fukuoka, Jaan Paer No. 0 A QUASI-STEADY YLINDER METHOD FOR THE SIMULTANEOUS DETERMINATION OF HEAT APAITY, THERMAL ONDUTIVITY AND
More informationDeriving Indicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V.
Deriving ndicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V. Deutsch Centre for Comutational Geostatistics Deartment of Civil &
More informationInformation collection on a graph
Information collection on a grah Ilya O. Ryzhov Warren Powell February 10, 2010 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements
More informationComputer arithmetic. Intensive Computation. Annalisa Massini 2017/2018
Comuter arithmetic Intensive Comutation Annalisa Massini 7/8 Intensive Comutation - 7/8 References Comuter Architecture - A Quantitative Aroach Hennessy Patterson Aendix J Intensive Comutation - 7/8 3
More informationON POLYNOMIAL SELECTION FOR THE GENERAL NUMBER FIELD SIEVE
MATHEMATICS OF COMPUTATIO Volume 75, umber 256, October 26, Pages 237 247 S 25-5718(6)187-9 Article electronically ublished on June 28, 26 O POLYOMIAL SELECTIO FOR THE GEERAL UMBER FIELD SIEVE THORSTE
More informationDesign of NARMA L-2 Control of Nonlinear Inverted Pendulum
International Research Journal of Alied and Basic Sciences 016 Available online at www.irjabs.com ISSN 51-838X / Vol, 10 (6): 679-684 Science Exlorer Publications Design of NARMA L- Control of Nonlinear
More informationOn 2-Armed Gaussian Bandits and Optimization
On -Armed Gaussian Bandits and Otimization William G. Macready David H. Wolert SFI WORKING PAPER: 996--9 SFI Working Paers contain accounts of scientific work of the author(s) and do not necessarily reresent
More informationMATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK
Comuter Modelling and ew Technologies, 5, Vol.9, o., 3-39 Transort and Telecommunication Institute, Lomonosov, LV-9, Riga, Latvia MATHEMATICAL MODELLIG OF THE WIRELESS COMMUICATIO ETWORK M. KOPEETSK Deartment
More informationarxiv: v1 [physics.data-an] 26 Oct 2012
Constraints on Yield Parameters in Extended Maximum Likelihood Fits Till Moritz Karbach a, Maximilian Schlu b a TU Dortmund, Germany, moritz.karbach@cern.ch b TU Dortmund, Germany, maximilian.schlu@cern.ch
More informationOrdered and disordered dynamics in random networks
EUROPHYSICS LETTERS 5 March 998 Eurohys. Lett., 4 (6),. 599-604 (998) Ordered and disordered dynamics in random networks L. Glass ( )andc. Hill 2,3 Deartment of Physiology, McGill University 3655 Drummond
More informationObserver/Kalman Filter Time Varying System Identification
Observer/Kalman Filter Time Varying System Identification Manoranjan Majji Texas A&M University, College Station, Texas, USA Jer-Nan Juang 2 National Cheng Kung University, Tainan, Taiwan and John L. Junins
More informationLEIBNIZ SEMINORMS IN PROBABILITY SPACES
LEIBNIZ SEMINORMS IN PROBABILITY SPACES ÁDÁM BESENYEI AND ZOLTÁN LÉKA Abstract. In this aer we study the (strong) Leibniz roerty of centered moments of bounded random variables. We shall answer a question
More informationConvex Optimization methods for Computing Channel Capacity
Convex Otimization methods for Comuting Channel Caacity Abhishek Sinha Laboratory for Information and Decision Systems (LIDS), MIT sinhaa@mit.edu May 15, 2014 We consider a classical comutational roblem
More informationRecursive Estimation of the Preisach Density function for a Smart Actuator
Recursive Estimation of the Preisach Density function for a Smart Actuator Ram V. Iyer Deartment of Mathematics and Statistics, Texas Tech University, Lubbock, TX 7949-142. ABSTRACT The Preisach oerator
More informationIntrinsic Approximation on Cantor-like Sets, a Problem of Mahler
Intrinsic Aroximation on Cantor-like Sets, a Problem of Mahler Ryan Broderick, Lior Fishman, Asaf Reich and Barak Weiss July 200 Abstract In 984, Kurt Mahler osed the following fundamental question: How
More informationBayesian Model Averaging Kriging Jize Zhang and Alexandros Taflanidis
HIPAD LAB: HIGH PERFORMANCE SYSTEMS LABORATORY DEPARTMENT OF CIVIL AND ENVIRONMENTAL ENGINEERING AND EARTH SCIENCES Bayesian Model Averaging Kriging Jize Zhang and Alexandros Taflanidis Why use metamodeling
More informationAn Ant Colony Optimization Approach to the Probabilistic Traveling Salesman Problem
An Ant Colony Otimization Aroach to the Probabilistic Traveling Salesman Problem Leonora Bianchi 1, Luca Maria Gambardella 1, and Marco Dorigo 2 1 IDSIA, Strada Cantonale Galleria 2, CH-6928 Manno, Switzerland
More informationShadow Computing: An Energy-Aware Fault Tolerant Computing Model
Shadow Comuting: An Energy-Aware Fault Tolerant Comuting Model Bryan Mills, Taieb Znati, Rami Melhem Deartment of Comuter Science University of Pittsburgh (bmills, znati, melhem)@cs.itt.edu Index Terms
More informationGeneration of Linear Models using Simulation Results
4. IMACS-Symosium MATHMOD, Wien, 5..003,. 436-443 Generation of Linear Models using Simulation Results Georg Otte, Sven Reitz, Joachim Haase Fraunhofer Institute for Integrated Circuits, Branch Lab Design
More informationInformation collection on a graph
Information collection on a grah Ilya O. Ryzhov Warren Powell October 25, 2009 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements
More informationUniformly best wavenumber approximations by spatial central difference operators: An initial investigation
Uniformly best wavenumber aroximations by satial central difference oerators: An initial investigation Vitor Linders and Jan Nordström Abstract A characterisation theorem for best uniform wavenumber aroximations
More informationA Qualitative Event-based Approach to Multiple Fault Diagnosis in Continuous Systems using Structural Model Decomposition
A Qualitative Event-based Aroach to Multile Fault Diagnosis in Continuous Systems using Structural Model Decomosition Matthew J. Daigle a,,, Anibal Bregon b,, Xenofon Koutsoukos c, Gautam Biswas c, Belarmino
More informationŽ. Ž. Ž. 2 QUADRATIC AND INVERSE REGRESSIONS FOR WISHART DISTRIBUTIONS 1
The Annals of Statistics 1998, Vol. 6, No., 573595 QUADRATIC AND INVERSE REGRESSIONS FOR WISHART DISTRIBUTIONS 1 BY GERARD LETAC AND HELENE ` MASSAM Universite Paul Sabatier and York University If U and
More informationFinding recurrent sources in sequences
Finding recurrent sources in sequences Aristides Gionis Deartment of Comuter Science Stanford University Stanford, CA, 94305, USA gionis@cs.stanford.edu Heikki Mannila HIIT Basic Research Unit Deartment
More informationHomework Solution 4 for APPM4/5560 Markov Processes
Homework Solution 4 for APPM4/556 Markov Processes 9.Reflecting random walk on the line. Consider the oints,,, 4 to be marked on a straight line. Let X n be a Markov chain that moves to the right with
More informationPreconditioning techniques for Newton s method for the incompressible Navier Stokes equations
Preconditioning techniques for Newton s method for the incomressible Navier Stokes equations H. C. ELMAN 1, D. LOGHIN 2 and A. J. WATHEN 3 1 Deartment of Comuter Science, University of Maryland, College
More informationarxiv: v2 [stat.ml] 7 May 2015
Semi-Orthogonal Multilinear PCA with Relaxed Start Qiuan Shi and Haiing Lu Deartment of Comuter Science Hong Kong Batist University, Hong Kong, China csshi@com.hkbu.edu.hk, haiing@hkbu.edu.hk arxiv:1504.08142v2
More informationA Closed-Form Solution to the Minimum V 2
Celestial Mechanics and Dynamical Astronomy manuscrit No. (will be inserted by the editor) Martín Avendaño Daniele Mortari A Closed-Form Solution to the Minimum V tot Lambert s Problem Received: Month
More informationMetrics Performance Evaluation: Application to Face Recognition
Metrics Performance Evaluation: Alication to Face Recognition Naser Zaeri, Abeer AlSadeq, and Abdallah Cherri Electrical Engineering Det., Kuwait University, P.O. Box 5969, Safat 6, Kuwait {zaery, abeer,
More informationLower bound solutions for bearing capacity of jointed rock
Comuters and Geotechnics 31 (2004) 23 36 www.elsevier.com/locate/comgeo Lower bound solutions for bearing caacity of jointed rock D.J. Sutcliffe a, H.S. Yu b, *, S.W. Sloan c a Deartment of Civil, Surveying
More informationAn Algorithm of Two-Phase Learning for Eleman Neural Network to Avoid the Local Minima Problem
IJCSNS International Journal of Comuter Science and Networ Security, VOL.7 No.4, Aril 2007 1 An Algorithm of Two-Phase Learning for Eleman Neural Networ to Avoid the Local Minima Problem Zhiqiang Zhang,
More informationBayesian System for Differential Cryptanalysis of DES
Available online at www.sciencedirect.com ScienceDirect IERI Procedia 7 (014 ) 15 0 013 International Conference on Alied Comuting, Comuter Science, and Comuter Engineering Bayesian System for Differential
More informationA Game Theoretic Investigation of Selection Methods in Two Population Coevolution
A Game Theoretic Investigation of Selection Methods in Two Poulation Coevolution Sevan G. Ficici Division of Engineering and Alied Sciences Harvard University Cambridge, Massachusetts 238 USA sevan@eecs.harvard.edu
More informationThe non-stochastic multi-armed bandit problem
Submitted for journal ublication. The non-stochastic multi-armed bandit roblem Peter Auer Institute for Theoretical Comuter Science Graz University of Technology A-8010 Graz (Austria) auer@igi.tu-graz.ac.at
More information