Influence of weight initialization on multilayer perceptron performance
|
|
- Elaine Rogers
- 5 years ago
- Views:
Transcription
1 Influence of weight initialization on multilayer perceptron performance M. Karouia (1,2) T. Denœux (1) R. Lengellé (1) (1) Université e Compiègne U.R.A. CNRS 817 Heuiasyc BP F-66 Compiègne ceex - France mkarouia@hs.univ-compiegne.fr (2) Lyonnaise es Eaux (LIAC) Abstract This paper presents a new algorithm for initializing the weights in multilayer perceptrons. This metho is base on the use of feature vectors extracte by iscriminant analysis. Simulations carrie out with real-worl an synthetic ata sets show that the propose algorithm allows to obtain a better initial state, as compare to ranom initialization. As a result, training time is reuce an lower generalization error can be achieve. Aitionally, it is shown through numerical simulations that the generalization performance of networks initialize with the propose metho becomes less sensitive to network size an input imension. 1 Introuction Many researchers have emphasize the importance of initial weights in multilayer perceptron (MLP) training. Several initialization algorithms have been propose, such as the use of prototypes [2]. The most obvious potential benefits of starting optimization from a goo initial state are faster training an
2 higher probability of reaching a eep minimum of the error function. Aitionally, it has been foun that introucing prior knowlege in the initial weights may in some cases improve generalization performance [2, 8]. In this paper, a new approach to weight initialization is propose, an its effect on generalization is emonstrate experimentally. The starting point of this work is the relationship between MLPs an iscriminant analysis (DA) pointe out by Gallinari [4]. It can be shown that training networks with one hien layer using the quaratic error function is equivalent to maximizing a measure of class separability in the space spanne by hien units. DA techniques aim at extracting features that are effective in preserving class separability. The algorithm presente in this paper (WIDA: Weight Initialization by Discriminant Analysis) proposes to use such features for initializing the weights in multilayer networks before training by stanar back propagation (BP) or any other learning proceure. The performance of the WIDA metho is then analyze using several synthetic an real-worl ata sets. We examine the effect of weight initialization on the following aspects: convergence spee (training time), generalization error an sensitivity of generalization error to ata imensionality an number of hien units. 2 The initialization metho 2.1 Discriminant analysis We consier a set X of N samples in a -imensional space. The samples are assume to be partitione into M isjoint subsets. Subset X of size N i inclues samples properly associate with class Ω i. Let x ij be the j-th -imensional sample vector from class Ω i. The mean vector of class Ω i is m i = 1 Ni N i j=1 x ij. The overall mean vector is m = 1 Mi=1 N N i m i. We efine the parametric within-class scatter matrix W an the parametric betweenclass scatter matrix B respectively as: W = B = 1 N 1 N M N i (x ij m i )(x ij m i ) T (1) i=1 j=1 M N i (m i m)(m i m) T (2) i=1
3 where (.) T enotes transposition. Matrix W is assume to be positive efinite, so that W 1 exists. Matrix B is a positive semiefinite matrix with rank at most equal to M 1 (we assume that M). The sum of W an B gives the parametric global covariance matrix G. In parametric iscriminant analysis (PDA), we seek -imensional feature vectors τ maximizing the Fisher s criterion J(τ): J(τ) = τ T Bτ τ T Wτ (3) Such features are obtaine as the eigenvectors of W 1 B, each eigenvalue λ i being equal to the Fisher criterion of its corresponing eigenvector τ i (J(τ i ) = λ i ). PDA has two serious shortcomings. First, the maximum number of iscriminant vectors is limite to M 1. When M = 2, PDA allows to extract only one iscriminant vector. The secon an more funamental problem is the intrinsic parametric nature of PDA. When the class istributions are significantly non-normal, the use of PDA cannot be expecte to accurately etermine goo features preserving the complex structure neee for classification. Non-parametric iscriminant analysis (NPDA) was introuce to overcome both of the aforementione problems [3]. It is base on the use of a non-parametric between-class scatter matrix that measures between-class scatter on a local basis, using a k-nearest neighbor (k-nn) approach. Let us first consier the case where M = 2. Let n il (x) X (l = 1,...,k) be the k nearest neighbors in class Ω i of an arbitrary sample x X. The local mean of class Ω i (the sample mean of the k NNs from Ω i to x) is m ki (x) = 1 k kl=1 n il (x). The non-parametric between-class scatter matrix is then efine as B 12,k = 1 N ( x X p 12 (x)(x m k2 (x))(x m k2 (x)) T + x X p 12 (x)(x m k1 (x))(x m k1 (x)) T ) (4) The term p 12 (x) is efine as a function of the istances between x an its k-th nearest neighbor from each class [3]. Its role is to eemphasize the samples locate far away from the class bounary.
4 By substituting B with B 12,k in Equation 3, we obtain a non-parametric Fisher s criterion J (τ). The features maximizing J (τ) can be obtaine as the eigenvectors of W 1 B 12,k. Since B 12,k is generally full rank, the number of iscriminant vectors is not limite to M 1. To exten NPDA to general M-class problems, two alternatives have been stuie. The first one consists in consiering M two-class problems or ichotomies. For each ichotomy, we take one class as Ω 1 an the other M 1 classes as Ω 2 ; iscriminant vectors are extracte by the above proceure. Afterwars, the best iscriminant vectors can be chosen accoring to some selection proceure. The secon alternative consists in efining a generalize non-parametric between-class scatter matrix as B k = (1/N 2 ) i<j N i N j B ij,k. 2.2 Application to weight initialization The WIDA metho consists in initializing the hien unit weights as iscriminant vectors extracte by non-parametric DA, an aing bias terms. Learning is then carrie out in 3 steps: 1. the biases of hien neurons are etermine so as to maximize class separability in the space H spanne by hien units. As shown in [5, 6], a suitable measure of class separability is tr(g 1 h B h), where G h an B h are respectively the total an between-class scatter matrices in H. 2. the hien-to-output weights are initialize ranomly an traine separately to minimize the mean square output error; 3. finally, further training of the whole network is performe using the stanar back propagation algorithm. 3 Comparison to ranom initialization The above initialization proceure was teste an compare to other methos using the following ata sets: Waveform ata: it is a three-class synthetic problem in a 21-imensional feature space. Training an test sets both contain 1 samples of each class [1].
5 misclassification rate (%) vowel ata (11 hien units) epoch misclassification rate (%) 6 4 sonar ata (5 hien units) epoch misclassification rate (%) 1 5 waveform ata (4 hien units) epoch Figure 1: Mean test misclassification rate as a function of training cycles (averages over 1 trials). : ranom; - - : WIDA; -.- : prototype metho. Vowel ata: training an test ata have 1 features an are partitione in 11 classes. We use 528 ranomly chosen samples for training an the 462 remaining samples for the test. A complete escription of this ata is given in [7]. Sonar ata: this is a real-worl classification task [7] with 6 features an 2 classes. Training an test ata are both of size 14. The network weights were initialize with the WIDA algorithm, the prototype metho an ranomly. For each classification task, the number n of hien units was varie from 2 to n max. Training an test misclassification error rates were compute after each learning cycle. The algorithm was run 1 times for each value of n an each initialization metho. Figure 1 shows the evolution of mean error rates as a function of time for the three tasks. The means of the best error rates obtaine at each trial by the three methos are represente in Figure 2 as a function of n. As expecte, these results show that the WIDA metho provies goo initial solutions in terms of misclassification error. This results in faster training, although the gain in not very important because we use an accelerate version of back-propagation. The main avantage of our metho happens to be a better generalization performance for all three classification tasks. The test error rates obtaine with the WIDA metho were always significantly lower than those obtaine with ranom initialization (an, to a lesser extent, with the prototype metho).
6 misclassification rate (%) vowel ata number of hien units misclassification rate (%) sonar ata number of hien units misclassification rate (%) waveform ata number of hien units Figure 2: Mean test misclassification rate as a function ofn (averages over 1 trials). : ranom; - - : WIDA; -.- : prototype metho. 4 Influence of imensionality an network size The influence of imensionality an number of weights on generalization performance was stuie experimentally using a set of iscrimination tasks similar to that use in [8]. Each task consists in iscriminating between two multivariate Gaussian classes. Both classes have ientity covariance matrix. The class mean vectors are m 1 = (2,,..., ) an m 2 = m 1. This parameterization allows to keep the Mahalanobis istance, an hence the theoretical Bayes error rate, to constant values. Training sets of 1 samples (6 in each class) an test sets of 4 samples ( in each class) were ranomly generate. The two initialization proceures teste were the WIDA metho an ranom initialization. The number n of hien units was varie from 2 to 1, an the ata imension from 1 to 1 with a step of 1. For each of the configurations, the learning algorithm was run 1 times. The mean misclassification error rates were compute over the 1 trials. Figure 3 shows the obtaine mean misclassification rates with 95% confience intervals as a function of an n. As shown in Figure 3, the generalization performance of ranomly initialize networks egraes for large values of an n. This epenency of test error rate on the number of parameters to be estimate is well-known in the Pattern Recognition an Neural Network literature as the peaking phenomenon [8]. This phenomenon happens to be less important, in this case, when the initial weights are etermine by iscriminant analysis. The rate of increase of test error rate as a function of is smaller, an practically
7 (1) NHU=2 (2) NHU=3 (3) NHU= (4) NHU= (7) NHU= (5) NHU= (8) NHU= (6) NHU= (9) NHU= Figure 3: Mean test misclassification rate an 95 % confience interval as a function of ata imension an number of hien units (averages over 1 trials). -*- : WIDA initialization, -o- : ranom initialization, = ata imension, Egen = generalization error, NHU = number of hien units (n).
8 inepenent from n for 2 n 1. This fining can be interprete by remarking that the WIDA metho provies the learning algorithm with prior information concerning the ata structure, in the form of iscriminant axes. This allows to search only a certain region of weight space, in which weight vectors lea to relatively simple iscrimination bounaries. In that sense, careful initialization can be seen as performing some kin of regularization. This is consistent with the theoretical an experimental analysis performe by Rauys [8] in the case of linear classifiers, showing that suitable selection of initial weights may cancel the influence of imensionality on expecte probability of misclassification. 5 Conclusion A new weight initialization proceure for multilayer perceptrons has been presente. This proceure consists in using class-separability preserving feature vectors as the initial hien layer weights. Biases an output weights are then optimize separately, before fine tuning of all network parameters is performe by a stanar back-propagation algorithm. This scheme has been applie to several real-worl an artificial iscrimination tasks, an has been shown to yiel lower generalization error as compare to ranom initialization an (to a lesser extent) to the proceure propose in [2]. Experimental results also suggest that the introuction of prior knowlege about the ata structure in the form of iscriminant vectors reuces the harmful effect of excessive parameters on the expecte probability of misclassification. Our current work aims at combining this initialization proceure with a constructive training algorithm. References [1] L. Breiman, J. H. Frieman, R. A. Olshen, an C. J. Stone. Classification an Regression Trees. Wasworth, Belmont, CA, [2] T. Denœux an R. Lengellé. Initializing back-propagation networks with prototypes. Neural Networks, 6(3): , 1993.
9 [3] K. Fukunaga. Introuction to statistical pattern recognition. Electrical Science. 2n. eition, Acaemic Press, 199. [4] P. Gallinari, S. Thiria, F. Baran, an F. Fogelman-Soulie. On the relations between iscriminant analysis an multilayer perceptrons. Neural Networks, 4:349 36, [5] R. Lengellé an T. Denœux. Optimizing multilayer networks layer per layer without back-propagation. In I. Aleksaner an J. Taylor, eitors, Artificial Neural Networks II, pages North-Hollan, Amsteram, [6] R. Lengellé an T. Denœux. Training MLPs layer by layer using an objective function for internal representations. Neural Networks (to appear), [7] P. M. Murphy an D. W. Aha. UCI Repository of machine learning atabases [Machine-reaable ata repository]. University of California, Department of Information an Computer Science., Irvine, CA, [8] Rauys S. Why o multilayer perceptrons have favorable small sample properties? In E. S. Gelsema an L. N. Kanal, eitors, Pattern Recognition in Practice IV, pages , Amsteram, Elsevier.
Least-Squares Regression on Sparse Spaces
Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction
More informationBayesian Estimation of the Entropy of the Multivariate Gaussian
Bayesian Estimation of the Entropy of the Multivariate Gaussian Santosh Srivastava Fre Hutchinson Cancer Research Center Seattle, WA 989, USA Email: ssrivast@fhcrc.org Maya R. Gupta Department of Electrical
More informationBinary Discrimination Methods for High Dimensional Data with a. Geometric Representation
Binary Discrimination Methos for High Dimensional Data with a Geometric Representation Ay Bolivar-Cime, Luis Miguel Corova-Roriguez Universia Juárez Autónoma e Tabasco, División Acaémica e Ciencias Básicas
More informationLecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012
CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration
More informationLower bounds on Locality Sensitive Hashing
Lower bouns on Locality Sensitive Hashing Rajeev Motwani Assaf Naor Rina Panigrahy Abstract Given a metric space (X, X ), c 1, r > 0, an p, q [0, 1], a istribution over mappings H : X N is calle a (r,
More informationFast image compression using matrix K-L transform
Fast image compression using matrix K-L transform Daoqiang Zhang, Songcan Chen * Department of Computer Science an Engineering, Naning University of Aeronautics & Astronautics, Naning 2006, P.R. China.
More informationOn the Surprising Behavior of Distance Metrics in High Dimensional Space
On the Surprising Behavior of Distance Metrics in High Dimensional Space Charu C. Aggarwal, Alexaner Hinneburg 2, an Daniel A. Keim 2 IBM T. J. Watson Research Center Yortown Heights, NY 0598, USA. charu@watson.ibm.com
More informationCascaded redundancy reduction
Network: Comput. Neural Syst. 9 (1998) 73 84. Printe in the UK PII: S0954-898X(98)88342-5 Cascae reunancy reuction Virginia R e Sa an Geoffrey E Hinton Department of Computer Science, University of Toronto,
More informationRadial Basis-Function Networks
Raial Basis-Function Networks Back-Propagation Stochastic Back-Propagation Algorithm Step by Step Example Raial Basis-Function Networks Gaussian response function Location of center u Determining sigma
More informationSurvey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013
Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing
More informationOn the Value of Partial Information for Learning from Examples
JOURNAL OF COMPLEXITY 13, 509 544 (1998) ARTICLE NO. CM970459 On the Value of Partial Information for Learning from Examples Joel Ratsaby* Department of Electrical Engineering, Technion, Haifa, 32000 Israel
More informationParameter estimation: A new approach to weighting a priori information
Parameter estimation: A new approach to weighting a priori information J.L. Mea Department of Mathematics, Boise State University, Boise, ID 83725-555 E-mail: jmea@boisestate.eu Abstract. We propose a
More informationGaussian processes with monotonicity information
Gaussian processes with monotonicity information Anonymous Author Anonymous Author Unknown Institution Unknown Institution Abstract A metho for using monotonicity information in multivariate Gaussian process
More informationMulti-View Clustering via Canonical Correlation Analysis
Technical Report TTI-TR-2008-5 Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri UC San Diego Sham M. Kakae Toyota Technological Institute at Chicago ABSTRACT Clustering ata in
More informationA Hybrid Approach for Modeling High Dimensional Medical Data
A Hybri Approach for Moeling High Dimensional Meical Data Alok Sharma 1, Gofrey C. Onwubolu 1 1 University of the South Pacific, Fii sharma_al@usp.ac.f, onwubolu_g@usp.ac.f Abstract. his work presents
More informationFlexible High-Dimensional Classification Machines and Their Asymptotic Properties
Journal of Machine Learning Research 16 (2015) 1547-1572 Submitte 1/14; Revise 9/14; Publishe 8/15 Flexible High-Dimensional Classification Machines an Their Asymptotic Properties Xingye Qiao Department
More informationMulti-View Clustering via Canonical Correlation Analysis
Keywors: multi-view learning, clustering, canonical correlation analysis Abstract Clustering ata in high-imensions is believe to be a har problem in general. A number of efficient clustering algorithms
More informationRobust Low Rank Kernel Embeddings of Multivariate Distributions
Robust Low Rank Kernel Embeings of Multivariate Distributions Le Song, Bo Dai College of Computing, Georgia Institute of Technology lsong@cc.gatech.eu, boai@gatech.eu Abstract Kernel embeing of istributions
More informationIntroduction to Machine Learning
How o you estimate p(y x)? Outline Contents Introuction to Machine Learning Logistic Regression Varun Chanola April 9, 207 Generative vs. Discriminative Classifiers 2 Logistic Regression 2 3 Logistic Regression
More informationThermal conductivity of graded composites: Numerical simulations and an effective medium approximation
JOURNAL OF MATERIALS SCIENCE 34 (999)5497 5503 Thermal conuctivity of grae composites: Numerical simulations an an effective meium approximation P. M. HUI Department of Physics, The Chinese University
More informationHybrid Fusion for Biometrics: Combining Score-level and Decision-level Fusion
Hybri Fusion for Biometrics: Combining Score-level an Decision-level Fusion Qian Tao Raymon Velhuis Signals an Systems Group, University of Twente Postbus 217, 7500AE Enschee, the Netherlans {q.tao,r.n.j.velhuis}@ewi.utwente.nl
More informationA. Exclusive KL View of the MLE
A. Exclusive KL View of the MLE Lets assume a change-of-variable moel p Z z on the ranom variable Z R m, such as the one use in Dinh et al. 2017: z 0 p 0 z 0 an z = ψz 0, where ψ is an invertible function
More informationComputing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions
Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5
More informationLevel Construction of Decision Trees in a Partition-based Framework for Classification
Level Construction of Decision Trees in a Partition-base Framework for Classification Y.Y. Yao, Y. Zhao an J.T. Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canaa S4S
More informationA Review of Multiple Try MCMC algorithms for Signal Processing
A Review of Multiple Try MCMC algorithms for Signal Processing Luca Martino Image Processing Lab., Universitat e València (Spain) Universia Carlos III e Mari, Leganes (Spain) Abstract Many applications
More informationu!i = a T u = 0. Then S satisfies
Deterministic Conitions for Subspace Ientifiability from Incomplete Sampling Daniel L Pimentel-Alarcón, Nigel Boston, Robert D Nowak University of Wisconsin-Maison Abstract Consier an r-imensional subspace
More informationA PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks
A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,
More informationMulti-View Clustering via Canonical Correlation Analysis
Kamalika Chauhuri ITA, UC San Diego, 9500 Gilman Drive, La Jolla, CA Sham M. Kakae Karen Livescu Karthik Sriharan Toyota Technological Institute at Chicago, 6045 S. Kenwoo Ave., Chicago, IL kamalika@soe.ucs.eu
More informationCapacity Analysis of MIMO Systems with Unknown Channel State Information
Capacity Analysis of MIMO Systems with Unknown Channel State Information Jun Zheng an Bhaskar D. Rao Dept. of Electrical an Computer Engineering University of California at San Diego e-mail: juzheng@ucs.eu,
More informationTable of Common Derivatives By David Abraham
Prouct an Quotient Rules: Table of Common Derivatives By Davi Abraham [ f ( g( ] = [ f ( ] g( + f ( [ g( ] f ( = g( [ f ( ] g( g( f ( [ g( ] Trigonometric Functions: sin( = cos( cos( = sin( tan( = sec
More informationProblem Sheet 2: Eigenvalues and eigenvectors and their use in solving linear ODEs
Problem Sheet 2: Eigenvalues an eigenvectors an their use in solving linear ODEs If you fin any typos/errors in this problem sheet please email jk28@icacuk The material in this problem sheet is not examinable
More informationEuler equations for multiple integrals
Euler equations for multiple integrals January 22, 2013 Contents 1 Reminer of multivariable calculus 2 1.1 Vector ifferentiation......................... 2 1.2 Matrix ifferentiation........................
More informationLower Bounds for the Smoothed Number of Pareto optimal Solutions
Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.
More informationLimitations of One-Hidden-Layer Perceptron Networks
J. Yaghob (E.): ITAT 2015 pp. 167 171 Charles University in Prague, Prague, 2015 Limitations of One-Hien-Layer Perceptron Networks Věra Kůrková Institute of Computer Science, Czech Acaemy of Sciences,
More informationAdmin BACKPROPAGATION. Neural network. Neural network 11/3/16. Assignment 7. Assignment 8 Goals today. David Kauchak CS158 Fall 2016
Amin Assignment 7 Assignment 8 Goals toay BACKPROPAGATION Davi Kauchak CS58 Fall 206 Neural network Neural network inputs inputs some inputs are provie/ entere Iniviual perceptrons/ neurons Neural network
More informationSmall sample size generalization
9th Scandinavian Conference on Image Analysis, June 6-9, 1995, Uppsala, Sweden, Preprint Small sample size generalization Robert P.W. Duin Pattern Recognition Group, Faculty of Applied Physics Delft University
More informationGeneralization of the persistent random walk to dimensions greater than 1
PHYSICAL REVIEW E VOLUME 58, NUMBER 6 DECEMBER 1998 Generalization of the persistent ranom walk to imensions greater than 1 Marián Boguñá, Josep M. Porrà, an Jaume Masoliver Departament e Física Fonamental,
More informationThe Principle of Least Action
Chapter 7. The Principle of Least Action 7.1 Force Methos vs. Energy Methos We have so far stuie two istinct ways of analyzing physics problems: force methos, basically consisting of the application of
More informationKNN Particle Filters for Dynamic Hybrid Bayesian Networks
KNN Particle Filters for Dynamic Hybri Bayesian Networs H. D. Chen an K. C. Chang Dept. of Systems Engineering an Operations Research George Mason University MS 4A6, 4400 University Dr. Fairfax, VA 22030
More informationGeneralizing Kronecker Graphs in order to Model Searchable Networks
Generalizing Kronecker Graphs in orer to Moel Searchable Networks Elizabeth Boine, Babak Hassibi, Aam Wierman California Institute of Technology Pasaena, CA 925 Email: {eaboine, hassibi, aamw}@caltecheu
More informationEVALUATING HIGHER DERIVATIVE TENSORS BY FORWARD PROPAGATION OF UNIVARIATE TAYLOR SERIES
MATHEMATICS OF COMPUTATION Volume 69, Number 231, Pages 1117 1130 S 0025-5718(00)01120-0 Article electronically publishe on February 17, 2000 EVALUATING HIGHER DERIVATIVE TENSORS BY FORWARD PROPAGATION
More informationLectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs
Lectures - Week 10 Introuction to Orinary Differential Equations (ODES) First Orer Linear ODEs When stuying ODEs we are consiering functions of one inepenent variable, e.g., f(x), where x is the inepenent
More informationDiagonalization of Matrices Dr. E. Jacobs
Diagonalization of Matrices Dr. E. Jacobs One of the very interesting lessons in this course is how certain algebraic techniques can be use to solve ifferential equations. The purpose of these notes is
More informationExpected Value of Partial Perfect Information
Expecte Value of Partial Perfect Information Mike Giles 1, Takashi Goa 2, Howar Thom 3 Wei Fang 1, Zhenru Wang 1 1 Mathematical Institute, University of Oxfor 2 School of Engineering, University of Tokyo
More informationAgmon Kolmogorov Inequalities on l 2 (Z d )
Journal of Mathematics Research; Vol. 6, No. ; 04 ISSN 96-9795 E-ISSN 96-9809 Publishe by Canaian Center of Science an Eucation Agmon Kolmogorov Inequalities on l (Z ) Arman Sahovic Mathematics Department,
More informationThe new concepts of measurement error s regularities and effect characteristics
The new concepts of measurement error s regularities an effect characteristics Ye Xiaoming[1,] Liu Haibo [3,,] Ling Mo[3] Xiao Xuebin [5] [1] School of Geoesy an Geomatics, Wuhan University, Wuhan, Hubei,
More informationMulti-View Clustering via Canonical Correlation Analysis
Kamalika Chauhuri ITA, UC San Diego, 9500 Gilman Drive, La Jolla, CA Sham M. Kakae Karen Livescu Karthik Sriharan Toyota Technological Institute at Chicago, 6045 S. Kenwoo Ave., Chicago, IL kamalika@soe.ucs.eu
More informationSTATISTICAL LIKELIHOOD REPRESENTATIONS OF PRIOR KNOWLEDGE IN MACHINE LEARNING
STATISTICAL LIKELIHOOD REPRESENTATIONS OF PRIOR KNOWLEDGE IN MACHINE LEARNING Mark A. Kon Department of Mathematics an Statistics Boston University Boston, MA 02215 email: mkon@bu.eu Anrzej Przybyszewski
More information05 The Continuum Limit and the Wave Equation
Utah State University DigitalCommons@USU Founations of Wave Phenomena Physics, Department of 1-1-2004 05 The Continuum Limit an the Wave Equation Charles G. Torre Department of Physics, Utah State University,
More informationJointly continuous distributions and the multivariate Normal
Jointly continuous istributions an the multivariate Normal Márton alázs an álint Tóth October 3, 04 This little write-up is part of important founations of probability that were left out of the unit Probability
More informationTEMPORAL AND TIME-FREQUENCY CORRELATION-BASED BLIND SOURCE SEPARATION METHODS. Yannick DEVILLE
TEMPORAL AND TIME-FREQUENCY CORRELATION-BASED BLIND SOURCE SEPARATION METHODS Yannick DEVILLE Université Paul Sabatier Laboratoire Acoustique, Métrologie, Instrumentation Bât. 3RB2, 8 Route e Narbonne,
More informationTHE EFFICIENCIES OF THE SPATIAL MEDIAN AND SPATIAL SIGN COVARIANCE MATRIX FOR ELLIPTICALLY SYMMETRIC DISTRIBUTIONS
THE EFFICIENCIES OF THE SPATIAL MEDIAN AND SPATIAL SIGN COVARIANCE MATRIX FOR ELLIPTICALLY SYMMETRIC DISTRIBUTIONS BY ANDREW F. MAGYAR A issertation submitte to the Grauate School New Brunswick Rutgers,
More informationWUCHEN LI AND STANLEY OSHER
CONSTRAINED DYNAMICAL OPTIMAL TRANSPORT AND ITS LAGRANGIAN FORMULATION WUCHEN LI AND STANLEY OSHER Abstract. We propose ynamical optimal transport (OT) problems constraine in a parameterize probability
More informationPhysics 5153 Classical Mechanics. The Virial Theorem and The Poisson Bracket-1
Physics 5153 Classical Mechanics The Virial Theorem an The Poisson Bracket 1 Introuction In this lecture we will consier two applications of the Hamiltonian. The first, the Virial Theorem, applies to systems
More informationEIGEN-ANALYSIS OF KERNEL OPERATORS FOR NONLINEAR DIMENSION REDUCTION AND DISCRIMINATION
EIGEN-ANALYSIS OF KERNEL OPERATORS FOR NONLINEAR DIMENSION REDUCTION AND DISCRIMINATION DISSERTATION Presente in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Grauate
More informationInter-domain Gaussian Processes for Sparse Inference using Inducing Features
Inter-omain Gaussian Processes for Sparse Inference using Inucing Features Miguel Lázaro-Greilla an Aníbal R. Figueiras-Vial Dep. Signal Processing & Communications Universia Carlos III e Mari, SPAIN {miguel,arfv}@tsc.uc3m.es
More informationarxiv: v1 [hep-lat] 19 Nov 2013
HU-EP-13/69 SFB/CPP-13-98 DESY 13-225 Applicability of Quasi-Monte Carlo for lattice systems arxiv:1311.4726v1 [hep-lat] 19 ov 2013, a,b Tobias Hartung, c Karl Jansen, b Hernan Leovey, Anreas Griewank
More informationCalculus of Variations
16.323 Lecture 5 Calculus of Variations Calculus of Variations Most books cover this material well, but Kirk Chapter 4 oes a particularly nice job. x(t) x* x*+ αδx (1) x*- αδx (1) αδx (1) αδx (1) t f t
More informationImproving Estimation Accuracy in Nonrandomized Response Questioning Methods by Multiple Answers
International Journal of Statistics an Probability; Vol 6, No 5; September 207 ISSN 927-7032 E-ISSN 927-7040 Publishe by Canaian Center of Science an Eucation Improving Estimation Accuracy in Nonranomize
More informationPerfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs
Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs Ashish Goel Michael Kapralov Sanjeev Khanna Abstract We consier the well-stuie problem of fining a perfect matching in -regular bipartite
More informationModelling and simulation of dependence structures in nonlife insurance with Bernstein copulas
Moelling an simulation of epenence structures in nonlife insurance with Bernstein copulas Prof. Dr. Dietmar Pfeifer Dept. of Mathematics, University of Olenburg an AON Benfiel, Hamburg Dr. Doreen Straßburger
More informationNecessary and Sufficient Conditions for Sketched Subspace Clustering
Necessary an Sufficient Conitions for Sketche Subspace Clustering Daniel Pimentel-Alarcón, Laura Balzano 2, Robert Nowak University of Wisconsin-Maison, 2 University of Michigan-Ann Arbor Abstract This
More informationTutorial on Maximum Likelyhood Estimation: Parametric Density Estimation
Tutorial on Maximum Likelyhoo Estimation: Parametric Density Estimation Suhir B Kylasa 03/13/2014 1 Motivation Suppose one wishes to etermine just how biase an unfair coin is. Call the probability of tossing
More informationTime-of-Arrival Estimation in Non-Line-Of-Sight Environments
2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor
More information'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21
Large amping in a structural material may be either esirable or unesirable, epening on the engineering application at han. For example, amping is a esirable property to the esigner concerne with limiting
More informationθ x = f ( x,t) could be written as
9. Higher orer PDEs as systems of first-orer PDEs. Hyperbolic systems. For PDEs, as for ODEs, we may reuce the orer by efining new epenent variables. For example, in the case of the wave equation, (1)
More informationHomework 3 - Solutions
Homework 3 - Solutions The Transpose an Partial Transpose. 1 Let { 1, 2,, } be an orthonormal basis for C. The transpose map efine with respect to this basis is a superoperator Γ that acts on an operator
More informationEquilibrium in Queues Under Unknown Service Times and Service Value
University of Pennsylvania ScholarlyCommons Finance Papers Wharton Faculty Research 1-2014 Equilibrium in Queues Uner Unknown Service Times an Service Value Laurens Debo Senthil K. Veeraraghavan University
More information7.1 Support Vector Machine
67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to
More informationPROBLEMS of estimating relative or absolute camera pose
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 7, JULY 2012 1381 Polynomial Eigenvalue Solutions to Minimal Problems in Computer Vision Zuzana Kukelova, Member, IEEE, Martin
More informationEngineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers
Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:
More informationUniversity of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries
University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent
More informationJUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson
JUST THE MATHS UNIT NUMBER 10.2 DIFFERENTIATION 2 (Rates of change) by A.J.Hobson 10.2.1 Introuction 10.2.2 Average rates of change 10.2.3 Instantaneous rates of change 10.2.4 Derivatives 10.2.5 Exercises
More informationTechnion - Computer Science Department - M.Sc. Thesis MSC Constrained Codes for Two-Dimensional Channels.
Technion - Computer Science Department - M.Sc. Thesis MSC-2006- - 2006 Constraine Coes for Two-Dimensional Channels Keren Censor Technion - Computer Science Department - M.Sc. Thesis MSC-2006- - 2006 Technion
More informationIntroduction to the Vlasov-Poisson system
Introuction to the Vlasov-Poisson system Simone Calogero 1 The Vlasov equation Consier a particle with mass m > 0. Let x(t) R 3 enote the position of the particle at time t R an v(t) = ẋ(t) = x(t)/t its
More informationarxiv: v4 [cs.ds] 7 Mar 2014
Analysis of Agglomerative Clustering Marcel R. Ackermann Johannes Blömer Daniel Kuntze Christian Sohler arxiv:101.697v [cs.ds] 7 Mar 01 Abstract The iameter k-clustering problem is the problem of partitioning
More informationStable and compact finite difference schemes
Center for Turbulence Research Annual Research Briefs 2006 2 Stable an compact finite ifference schemes By K. Mattsson, M. Svär AND M. Shoeybi. Motivation an objectives Compact secon erivatives have long
More informationA Course in Machine Learning
A Course in Machine Learning Hal Daumé III 12 EFFICIENT LEARNING So far, our focus has been on moels of learning an basic algorithms for those moels. We have not place much emphasis on how to learn quickly.
More informationLATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION
The Annals of Statistics 1997, Vol. 25, No. 6, 2313 2327 LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION By Eva Riccomagno, 1 Rainer Schwabe 2 an Henry P. Wynn 1 University of Warwick, Technische
More informationRank, Trace, Determinant, Transpose an Inverse of a Matrix Let A be an n n square matrix: A = a11 a1 a1n a1 a an a n1 a n a nn nn where is the jth col
Review of Linear Algebra { E18 Hanout Vectors an Their Inner Proucts Let X an Y be two vectors: an Their inner prouct is ene as X =[x1; ;x n ] T Y =[y1; ;y n ] T (X; Y ) = X T Y = x k y k k=1 where T an
More informationSeparation of Variables
Physics 342 Lecture 1 Separation of Variables Lecture 1 Physics 342 Quantum Mechanics I Monay, January 25th, 2010 There are three basic mathematical tools we nee, an then we can begin working on the physical
More informationCMA-ES with Optimal Covariance Update and Storage Complexity
CMA-ES with Optimal Covariance Upate an Storage Complexity Oswin Krause Dept. of Computer Science University of Copenhagen Copenhagen, Denmark oswin.krause@i.ku.k Díac R. Arbonès Dept. of Computer Science
More informationLECTURE NOTES ON DVORETZKY S THEOREM
LECTURE NOTES ON DVORETZKY S THEOREM STEVEN HEILMAN Abstract. We present the first half of the paper [S]. In particular, the results below, unless otherwise state, shoul be attribute to G. Schechtman.
More informationIntroduction to Markov Processes
Introuction to Markov Processes Connexions moule m44014 Zzis law Gustav) Meglicki, Jr Office of the VP for Information Technology Iniana University RCS: Section-2.tex,v 1.24 2012/12/21 18:03:08 gustav
More informationTrack Initialization from Incomplete Measurements
Track Initialiation from Incomplete Measurements Christian R. Berger, Martina Daun an Wolfgang Koch Department of Electrical an Computer Engineering, University of Connecticut, Storrs, Connecticut 6269,
More informationEstimation of the Maximum Domination Value in Multi-Dimensional Data Sets
Proceeings of the 4th East-European Conference on Avances in Databases an Information Systems ADBIS) 200 Estimation of the Maximum Domination Value in Multi-Dimensional Data Sets Eleftherios Tiakas, Apostolos.
More informationRobustness and Perturbations of Minimal Bases
Robustness an Perturbations of Minimal Bases Paul Van Dooren an Froilán M Dopico December 9, 2016 Abstract Polynomial minimal bases of rational vector subspaces are a classical concept that plays an important
More informationContents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)
Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture
More informationTMA 4195 Matematisk modellering Exam Tuesday December 16, :00 13:00 Problems and solution with additional comments
Problem F U L W D g m 3 2 s 2 0 0 0 0 2 kg 0 0 0 0 0 0 Table : Dimension matrix TMA 495 Matematisk moellering Exam Tuesay December 6, 2008 09:00 3:00 Problems an solution with aitional comments The necessary
More informationSYNCHRONOUS SEQUENTIAL CIRCUITS
CHAPTER SYNCHRONOUS SEUENTIAL CIRCUITS Registers an counters, two very common synchronous sequential circuits, are introuce in this chapter. Register is a igital circuit for storing information. Contents
More informationTopic Modeling: Beyond Bag-of-Words
Hanna M. Wallach Cavenish Laboratory, University of Cambrige, Cambrige CB3 0HE, UK hmw26@cam.ac.u Abstract Some moels of textual corpora employ text generation methos involving n-gram statistics, while
More informationLocal Linear ICA for Mutual Information Estimation in Feature Selection
Local Linear ICA for Mutual Information Estimation in Feature Selection Tian Lan, Deniz Erogmus Department of Biomeical Engineering, OGI, Oregon Health & Science University, Portlan, Oregon, USA E-mail:
More informationLinear First-Order Equations
5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)
More informationSparse Reconstruction of Systems of Ordinary Differential Equations
Sparse Reconstruction of Systems of Orinary Differential Equations Manuel Mai a, Mark D. Shattuck b,c, Corey S. O Hern c,a,,e, a Department of Physics, Yale University, New Haven, Connecticut 06520, USA
More informationInterpretation of the Multi-Stage Nested Wiener Filter in the Krylov Subspace Framework
Interpretation of the Multi-Stage Neste Wiener Filter in the Krylov Subspace Framework M. Joham an M. D. Zoltowski School of Electrical Engineering, Purue University, West Lafayette, IN 47907-285 e-mail:
More informationA Randomized Approximate Nearest Neighbors Algorithm - a short version
We present a ranomize algorithm for the approximate nearest neighbor problem in - imensional Eucliean space. Given N points {x } in R, the algorithm attempts to fin k nearest neighbors for each of x, where
More informationApplied Statistics. Multivariate Analysis - part II. Troels C. Petersen (NBI) Statistics is merely a quantization of common sense 1
Applied Statistics Multivariate Analysis - part II Troels C. Petersen (NBI) Statistics is merely a quantization of common sense 1 Fisher Discriminant You want to separate two types/classes (A and B) of
More informationSYSTEMS OF DIFFERENTIAL EQUATIONS, EULER S FORMULA. where L is some constant, usually called the Lipschitz constant. An example is
SYSTEMS OF DIFFERENTIAL EQUATIONS, EULER S FORMULA. Uniqueness for solutions of ifferential equations. We consier the system of ifferential equations given by x = v( x), () t with a given initial conition
More informationState observers and recursive filters in classical feedback control theory
State observers an recursive filters in classical feeback control theory State-feeback control example: secon-orer system Consier the riven secon-orer system q q q u x q x q x x x x Here u coul represent
More informationLogarithmic spurious regressions
Logarithmic spurious regressions Robert M. e Jong Michigan State University February 5, 22 Abstract Spurious regressions, i.e. regressions in which an integrate process is regresse on another integrate
More information