Influence of weight initialization on multilayer perceptron performance

Size: px
Start display at page:

Download "Influence of weight initialization on multilayer perceptron performance"

Transcription

1 Influence of weight initialization on multilayer perceptron performance M. Karouia (1,2) T. Denœux (1) R. Lengellé (1) (1) Université e Compiègne U.R.A. CNRS 817 Heuiasyc BP F-66 Compiègne ceex - France mkarouia@hs.univ-compiegne.fr (2) Lyonnaise es Eaux (LIAC) Abstract This paper presents a new algorithm for initializing the weights in multilayer perceptrons. This metho is base on the use of feature vectors extracte by iscriminant analysis. Simulations carrie out with real-worl an synthetic ata sets show that the propose algorithm allows to obtain a better initial state, as compare to ranom initialization. As a result, training time is reuce an lower generalization error can be achieve. Aitionally, it is shown through numerical simulations that the generalization performance of networks initialize with the propose metho becomes less sensitive to network size an input imension. 1 Introuction Many researchers have emphasize the importance of initial weights in multilayer perceptron (MLP) training. Several initialization algorithms have been propose, such as the use of prototypes [2]. The most obvious potential benefits of starting optimization from a goo initial state are faster training an

2 higher probability of reaching a eep minimum of the error function. Aitionally, it has been foun that introucing prior knowlege in the initial weights may in some cases improve generalization performance [2, 8]. In this paper, a new approach to weight initialization is propose, an its effect on generalization is emonstrate experimentally. The starting point of this work is the relationship between MLPs an iscriminant analysis (DA) pointe out by Gallinari [4]. It can be shown that training networks with one hien layer using the quaratic error function is equivalent to maximizing a measure of class separability in the space spanne by hien units. DA techniques aim at extracting features that are effective in preserving class separability. The algorithm presente in this paper (WIDA: Weight Initialization by Discriminant Analysis) proposes to use such features for initializing the weights in multilayer networks before training by stanar back propagation (BP) or any other learning proceure. The performance of the WIDA metho is then analyze using several synthetic an real-worl ata sets. We examine the effect of weight initialization on the following aspects: convergence spee (training time), generalization error an sensitivity of generalization error to ata imensionality an number of hien units. 2 The initialization metho 2.1 Discriminant analysis We consier a set X of N samples in a -imensional space. The samples are assume to be partitione into M isjoint subsets. Subset X of size N i inclues samples properly associate with class Ω i. Let x ij be the j-th -imensional sample vector from class Ω i. The mean vector of class Ω i is m i = 1 Ni N i j=1 x ij. The overall mean vector is m = 1 Mi=1 N N i m i. We efine the parametric within-class scatter matrix W an the parametric betweenclass scatter matrix B respectively as: W = B = 1 N 1 N M N i (x ij m i )(x ij m i ) T (1) i=1 j=1 M N i (m i m)(m i m) T (2) i=1

3 where (.) T enotes transposition. Matrix W is assume to be positive efinite, so that W 1 exists. Matrix B is a positive semiefinite matrix with rank at most equal to M 1 (we assume that M). The sum of W an B gives the parametric global covariance matrix G. In parametric iscriminant analysis (PDA), we seek -imensional feature vectors τ maximizing the Fisher s criterion J(τ): J(τ) = τ T Bτ τ T Wτ (3) Such features are obtaine as the eigenvectors of W 1 B, each eigenvalue λ i being equal to the Fisher criterion of its corresponing eigenvector τ i (J(τ i ) = λ i ). PDA has two serious shortcomings. First, the maximum number of iscriminant vectors is limite to M 1. When M = 2, PDA allows to extract only one iscriminant vector. The secon an more funamental problem is the intrinsic parametric nature of PDA. When the class istributions are significantly non-normal, the use of PDA cannot be expecte to accurately etermine goo features preserving the complex structure neee for classification. Non-parametric iscriminant analysis (NPDA) was introuce to overcome both of the aforementione problems [3]. It is base on the use of a non-parametric between-class scatter matrix that measures between-class scatter on a local basis, using a k-nearest neighbor (k-nn) approach. Let us first consier the case where M = 2. Let n il (x) X (l = 1,...,k) be the k nearest neighbors in class Ω i of an arbitrary sample x X. The local mean of class Ω i (the sample mean of the k NNs from Ω i to x) is m ki (x) = 1 k kl=1 n il (x). The non-parametric between-class scatter matrix is then efine as B 12,k = 1 N ( x X p 12 (x)(x m k2 (x))(x m k2 (x)) T + x X p 12 (x)(x m k1 (x))(x m k1 (x)) T ) (4) The term p 12 (x) is efine as a function of the istances between x an its k-th nearest neighbor from each class [3]. Its role is to eemphasize the samples locate far away from the class bounary.

4 By substituting B with B 12,k in Equation 3, we obtain a non-parametric Fisher s criterion J (τ). The features maximizing J (τ) can be obtaine as the eigenvectors of W 1 B 12,k. Since B 12,k is generally full rank, the number of iscriminant vectors is not limite to M 1. To exten NPDA to general M-class problems, two alternatives have been stuie. The first one consists in consiering M two-class problems or ichotomies. For each ichotomy, we take one class as Ω 1 an the other M 1 classes as Ω 2 ; iscriminant vectors are extracte by the above proceure. Afterwars, the best iscriminant vectors can be chosen accoring to some selection proceure. The secon alternative consists in efining a generalize non-parametric between-class scatter matrix as B k = (1/N 2 ) i<j N i N j B ij,k. 2.2 Application to weight initialization The WIDA metho consists in initializing the hien unit weights as iscriminant vectors extracte by non-parametric DA, an aing bias terms. Learning is then carrie out in 3 steps: 1. the biases of hien neurons are etermine so as to maximize class separability in the space H spanne by hien units. As shown in [5, 6], a suitable measure of class separability is tr(g 1 h B h), where G h an B h are respectively the total an between-class scatter matrices in H. 2. the hien-to-output weights are initialize ranomly an traine separately to minimize the mean square output error; 3. finally, further training of the whole network is performe using the stanar back propagation algorithm. 3 Comparison to ranom initialization The above initialization proceure was teste an compare to other methos using the following ata sets: Waveform ata: it is a three-class synthetic problem in a 21-imensional feature space. Training an test sets both contain 1 samples of each class [1].

5 misclassification rate (%) vowel ata (11 hien units) epoch misclassification rate (%) 6 4 sonar ata (5 hien units) epoch misclassification rate (%) 1 5 waveform ata (4 hien units) epoch Figure 1: Mean test misclassification rate as a function of training cycles (averages over 1 trials). : ranom; - - : WIDA; -.- : prototype metho. Vowel ata: training an test ata have 1 features an are partitione in 11 classes. We use 528 ranomly chosen samples for training an the 462 remaining samples for the test. A complete escription of this ata is given in [7]. Sonar ata: this is a real-worl classification task [7] with 6 features an 2 classes. Training an test ata are both of size 14. The network weights were initialize with the WIDA algorithm, the prototype metho an ranomly. For each classification task, the number n of hien units was varie from 2 to n max. Training an test misclassification error rates were compute after each learning cycle. The algorithm was run 1 times for each value of n an each initialization metho. Figure 1 shows the evolution of mean error rates as a function of time for the three tasks. The means of the best error rates obtaine at each trial by the three methos are represente in Figure 2 as a function of n. As expecte, these results show that the WIDA metho provies goo initial solutions in terms of misclassification error. This results in faster training, although the gain in not very important because we use an accelerate version of back-propagation. The main avantage of our metho happens to be a better generalization performance for all three classification tasks. The test error rates obtaine with the WIDA metho were always significantly lower than those obtaine with ranom initialization (an, to a lesser extent, with the prototype metho).

6 misclassification rate (%) vowel ata number of hien units misclassification rate (%) sonar ata number of hien units misclassification rate (%) waveform ata number of hien units Figure 2: Mean test misclassification rate as a function ofn (averages over 1 trials). : ranom; - - : WIDA; -.- : prototype metho. 4 Influence of imensionality an network size The influence of imensionality an number of weights on generalization performance was stuie experimentally using a set of iscrimination tasks similar to that use in [8]. Each task consists in iscriminating between two multivariate Gaussian classes. Both classes have ientity covariance matrix. The class mean vectors are m 1 = (2,,..., ) an m 2 = m 1. This parameterization allows to keep the Mahalanobis istance, an hence the theoretical Bayes error rate, to constant values. Training sets of 1 samples (6 in each class) an test sets of 4 samples ( in each class) were ranomly generate. The two initialization proceures teste were the WIDA metho an ranom initialization. The number n of hien units was varie from 2 to 1, an the ata imension from 1 to 1 with a step of 1. For each of the configurations, the learning algorithm was run 1 times. The mean misclassification error rates were compute over the 1 trials. Figure 3 shows the obtaine mean misclassification rates with 95% confience intervals as a function of an n. As shown in Figure 3, the generalization performance of ranomly initialize networks egraes for large values of an n. This epenency of test error rate on the number of parameters to be estimate is well-known in the Pattern Recognition an Neural Network literature as the peaking phenomenon [8]. This phenomenon happens to be less important, in this case, when the initial weights are etermine by iscriminant analysis. The rate of increase of test error rate as a function of is smaller, an practically

7 (1) NHU=2 (2) NHU=3 (3) NHU= (4) NHU= (7) NHU= (5) NHU= (8) NHU= (6) NHU= (9) NHU= Figure 3: Mean test misclassification rate an 95 % confience interval as a function of ata imension an number of hien units (averages over 1 trials). -*- : WIDA initialization, -o- : ranom initialization, = ata imension, Egen = generalization error, NHU = number of hien units (n).

8 inepenent from n for 2 n 1. This fining can be interprete by remarking that the WIDA metho provies the learning algorithm with prior information concerning the ata structure, in the form of iscriminant axes. This allows to search only a certain region of weight space, in which weight vectors lea to relatively simple iscrimination bounaries. In that sense, careful initialization can be seen as performing some kin of regularization. This is consistent with the theoretical an experimental analysis performe by Rauys [8] in the case of linear classifiers, showing that suitable selection of initial weights may cancel the influence of imensionality on expecte probability of misclassification. 5 Conclusion A new weight initialization proceure for multilayer perceptrons has been presente. This proceure consists in using class-separability preserving feature vectors as the initial hien layer weights. Biases an output weights are then optimize separately, before fine tuning of all network parameters is performe by a stanar back-propagation algorithm. This scheme has been applie to several real-worl an artificial iscrimination tasks, an has been shown to yiel lower generalization error as compare to ranom initialization an (to a lesser extent) to the proceure propose in [2]. Experimental results also suggest that the introuction of prior knowlege about the ata structure in the form of iscriminant vectors reuces the harmful effect of excessive parameters on the expecte probability of misclassification. Our current work aims at combining this initialization proceure with a constructive training algorithm. References [1] L. Breiman, J. H. Frieman, R. A. Olshen, an C. J. Stone. Classification an Regression Trees. Wasworth, Belmont, CA, [2] T. Denœux an R. Lengellé. Initializing back-propagation networks with prototypes. Neural Networks, 6(3): , 1993.

9 [3] K. Fukunaga. Introuction to statistical pattern recognition. Electrical Science. 2n. eition, Acaemic Press, 199. [4] P. Gallinari, S. Thiria, F. Baran, an F. Fogelman-Soulie. On the relations between iscriminant analysis an multilayer perceptrons. Neural Networks, 4:349 36, [5] R. Lengellé an T. Denœux. Optimizing multilayer networks layer per layer without back-propagation. In I. Aleksaner an J. Taylor, eitors, Artificial Neural Networks II, pages North-Hollan, Amsteram, [6] R. Lengellé an T. Denœux. Training MLPs layer by layer using an objective function for internal representations. Neural Networks (to appear), [7] P. M. Murphy an D. W. Aha. UCI Repository of machine learning atabases [Machine-reaable ata repository]. University of California, Department of Information an Computer Science., Irvine, CA, [8] Rauys S. Why o multilayer perceptrons have favorable small sample properties? In E. S. Gelsema an L. N. Kanal, eitors, Pattern Recognition in Practice IV, pages , Amsteram, Elsevier.

Least-Squares Regression on Sparse Spaces

Least-Squares Regression on Sparse Spaces Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction

More information

Bayesian Estimation of the Entropy of the Multivariate Gaussian

Bayesian Estimation of the Entropy of the Multivariate Gaussian Bayesian Estimation of the Entropy of the Multivariate Gaussian Santosh Srivastava Fre Hutchinson Cancer Research Center Seattle, WA 989, USA Email: ssrivast@fhcrc.org Maya R. Gupta Department of Electrical

More information

Binary Discrimination Methods for High Dimensional Data with a. Geometric Representation

Binary Discrimination Methods for High Dimensional Data with a. Geometric Representation Binary Discrimination Methos for High Dimensional Data with a Geometric Representation Ay Bolivar-Cime, Luis Miguel Corova-Roriguez Universia Juárez Autónoma e Tabasco, División Acaémica e Ciencias Básicas

More information

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012 CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration

More information

Lower bounds on Locality Sensitive Hashing

Lower bounds on Locality Sensitive Hashing Lower bouns on Locality Sensitive Hashing Rajeev Motwani Assaf Naor Rina Panigrahy Abstract Given a metric space (X, X ), c 1, r > 0, an p, q [0, 1], a istribution over mappings H : X N is calle a (r,

More information

Fast image compression using matrix K-L transform

Fast image compression using matrix K-L transform Fast image compression using matrix K-L transform Daoqiang Zhang, Songcan Chen * Department of Computer Science an Engineering, Naning University of Aeronautics & Astronautics, Naning 2006, P.R. China.

More information

On the Surprising Behavior of Distance Metrics in High Dimensional Space

On the Surprising Behavior of Distance Metrics in High Dimensional Space On the Surprising Behavior of Distance Metrics in High Dimensional Space Charu C. Aggarwal, Alexaner Hinneburg 2, an Daniel A. Keim 2 IBM T. J. Watson Research Center Yortown Heights, NY 0598, USA. charu@watson.ibm.com

More information

Cascaded redundancy reduction

Cascaded redundancy reduction Network: Comput. Neural Syst. 9 (1998) 73 84. Printe in the UK PII: S0954-898X(98)88342-5 Cascae reunancy reuction Virginia R e Sa an Geoffrey E Hinton Department of Computer Science, University of Toronto,

More information

Radial Basis-Function Networks

Radial Basis-Function Networks Raial Basis-Function Networks Back-Propagation Stochastic Back-Propagation Algorithm Step by Step Example Raial Basis-Function Networks Gaussian response function Location of center u Determining sigma

More information

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013 Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing

More information

On the Value of Partial Information for Learning from Examples

On the Value of Partial Information for Learning from Examples JOURNAL OF COMPLEXITY 13, 509 544 (1998) ARTICLE NO. CM970459 On the Value of Partial Information for Learning from Examples Joel Ratsaby* Department of Electrical Engineering, Technion, Haifa, 32000 Israel

More information

Parameter estimation: A new approach to weighting a priori information

Parameter estimation: A new approach to weighting a priori information Parameter estimation: A new approach to weighting a priori information J.L. Mea Department of Mathematics, Boise State University, Boise, ID 83725-555 E-mail: jmea@boisestate.eu Abstract. We propose a

More information

Gaussian processes with monotonicity information

Gaussian processes with monotonicity information Gaussian processes with monotonicity information Anonymous Author Anonymous Author Unknown Institution Unknown Institution Abstract A metho for using monotonicity information in multivariate Gaussian process

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Technical Report TTI-TR-2008-5 Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri UC San Diego Sham M. Kakae Toyota Technological Institute at Chicago ABSTRACT Clustering ata in

More information

A Hybrid Approach for Modeling High Dimensional Medical Data

A Hybrid Approach for Modeling High Dimensional Medical Data A Hybri Approach for Moeling High Dimensional Meical Data Alok Sharma 1, Gofrey C. Onwubolu 1 1 University of the South Pacific, Fii sharma_al@usp.ac.f, onwubolu_g@usp.ac.f Abstract. his work presents

More information

Flexible High-Dimensional Classification Machines and Their Asymptotic Properties

Flexible High-Dimensional Classification Machines and Their Asymptotic Properties Journal of Machine Learning Research 16 (2015) 1547-1572 Submitte 1/14; Revise 9/14; Publishe 8/15 Flexible High-Dimensional Classification Machines an Their Asymptotic Properties Xingye Qiao Department

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Keywors: multi-view learning, clustering, canonical correlation analysis Abstract Clustering ata in high-imensions is believe to be a har problem in general. A number of efficient clustering algorithms

More information

Robust Low Rank Kernel Embeddings of Multivariate Distributions

Robust Low Rank Kernel Embeddings of Multivariate Distributions Robust Low Rank Kernel Embeings of Multivariate Distributions Le Song, Bo Dai College of Computing, Georgia Institute of Technology lsong@cc.gatech.eu, boai@gatech.eu Abstract Kernel embeing of istributions

More information

Introduction to Machine Learning

Introduction to Machine Learning How o you estimate p(y x)? Outline Contents Introuction to Machine Learning Logistic Regression Varun Chanola April 9, 207 Generative vs. Discriminative Classifiers 2 Logistic Regression 2 3 Logistic Regression

More information

Thermal conductivity of graded composites: Numerical simulations and an effective medium approximation

Thermal conductivity of graded composites: Numerical simulations and an effective medium approximation JOURNAL OF MATERIALS SCIENCE 34 (999)5497 5503 Thermal conuctivity of grae composites: Numerical simulations an an effective meium approximation P. M. HUI Department of Physics, The Chinese University

More information

Hybrid Fusion for Biometrics: Combining Score-level and Decision-level Fusion

Hybrid Fusion for Biometrics: Combining Score-level and Decision-level Fusion Hybri Fusion for Biometrics: Combining Score-level an Decision-level Fusion Qian Tao Raymon Velhuis Signals an Systems Group, University of Twente Postbus 217, 7500AE Enschee, the Netherlans {q.tao,r.n.j.velhuis}@ewi.utwente.nl

More information

A. Exclusive KL View of the MLE

A. Exclusive KL View of the MLE A. Exclusive KL View of the MLE Lets assume a change-of-variable moel p Z z on the ranom variable Z R m, such as the one use in Dinh et al. 2017: z 0 p 0 z 0 an z = ψz 0, where ψ is an invertible function

More information

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5

More information

Level Construction of Decision Trees in a Partition-based Framework for Classification

Level Construction of Decision Trees in a Partition-based Framework for Classification Level Construction of Decision Trees in a Partition-base Framework for Classification Y.Y. Yao, Y. Zhao an J.T. Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canaa S4S

More information

A Review of Multiple Try MCMC algorithms for Signal Processing

A Review of Multiple Try MCMC algorithms for Signal Processing A Review of Multiple Try MCMC algorithms for Signal Processing Luca Martino Image Processing Lab., Universitat e València (Spain) Universia Carlos III e Mari, Leganes (Spain) Abstract Many applications

More information

u!i = a T u = 0. Then S satisfies

u!i = a T u = 0. Then S satisfies Deterministic Conitions for Subspace Ientifiability from Incomplete Sampling Daniel L Pimentel-Alarcón, Nigel Boston, Robert D Nowak University of Wisconsin-Maison Abstract Consier an r-imensional subspace

More information

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri ITA, UC San Diego, 9500 Gilman Drive, La Jolla, CA Sham M. Kakae Karen Livescu Karthik Sriharan Toyota Technological Institute at Chicago, 6045 S. Kenwoo Ave., Chicago, IL kamalika@soe.ucs.eu

More information

Capacity Analysis of MIMO Systems with Unknown Channel State Information

Capacity Analysis of MIMO Systems with Unknown Channel State Information Capacity Analysis of MIMO Systems with Unknown Channel State Information Jun Zheng an Bhaskar D. Rao Dept. of Electrical an Computer Engineering University of California at San Diego e-mail: juzheng@ucs.eu,

More information

Table of Common Derivatives By David Abraham

Table of Common Derivatives By David Abraham Prouct an Quotient Rules: Table of Common Derivatives By Davi Abraham [ f ( g( ] = [ f ( ] g( + f ( [ g( ] f ( = g( [ f ( ] g( g( f ( [ g( ] Trigonometric Functions: sin( = cos( cos( = sin( tan( = sec

More information

Problem Sheet 2: Eigenvalues and eigenvectors and their use in solving linear ODEs

Problem Sheet 2: Eigenvalues and eigenvectors and their use in solving linear ODEs Problem Sheet 2: Eigenvalues an eigenvectors an their use in solving linear ODEs If you fin any typos/errors in this problem sheet please email jk28@icacuk The material in this problem sheet is not examinable

More information

Euler equations for multiple integrals

Euler equations for multiple integrals Euler equations for multiple integrals January 22, 2013 Contents 1 Reminer of multivariable calculus 2 1.1 Vector ifferentiation......................... 2 1.2 Matrix ifferentiation........................

More information

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

Lower Bounds for the Smoothed Number of Pareto optimal Solutions Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.

More information

Limitations of One-Hidden-Layer Perceptron Networks

Limitations of One-Hidden-Layer Perceptron Networks J. Yaghob (E.): ITAT 2015 pp. 167 171 Charles University in Prague, Prague, 2015 Limitations of One-Hien-Layer Perceptron Networks Věra Kůrková Institute of Computer Science, Czech Acaemy of Sciences,

More information

Admin BACKPROPAGATION. Neural network. Neural network 11/3/16. Assignment 7. Assignment 8 Goals today. David Kauchak CS158 Fall 2016

Admin BACKPROPAGATION. Neural network. Neural network 11/3/16. Assignment 7. Assignment 8 Goals today. David Kauchak CS158 Fall 2016 Amin Assignment 7 Assignment 8 Goals toay BACKPROPAGATION Davi Kauchak CS58 Fall 206 Neural network Neural network inputs inputs some inputs are provie/ entere Iniviual perceptrons/ neurons Neural network

More information

Small sample size generalization

Small sample size generalization 9th Scandinavian Conference on Image Analysis, June 6-9, 1995, Uppsala, Sweden, Preprint Small sample size generalization Robert P.W. Duin Pattern Recognition Group, Faculty of Applied Physics Delft University

More information

Generalization of the persistent random walk to dimensions greater than 1

Generalization of the persistent random walk to dimensions greater than 1 PHYSICAL REVIEW E VOLUME 58, NUMBER 6 DECEMBER 1998 Generalization of the persistent ranom walk to imensions greater than 1 Marián Boguñá, Josep M. Porrà, an Jaume Masoliver Departament e Física Fonamental,

More information

The Principle of Least Action

The Principle of Least Action Chapter 7. The Principle of Least Action 7.1 Force Methos vs. Energy Methos We have so far stuie two istinct ways of analyzing physics problems: force methos, basically consisting of the application of

More information

KNN Particle Filters for Dynamic Hybrid Bayesian Networks

KNN Particle Filters for Dynamic Hybrid Bayesian Networks KNN Particle Filters for Dynamic Hybri Bayesian Networs H. D. Chen an K. C. Chang Dept. of Systems Engineering an Operations Research George Mason University MS 4A6, 4400 University Dr. Fairfax, VA 22030

More information

Generalizing Kronecker Graphs in order to Model Searchable Networks

Generalizing Kronecker Graphs in order to Model Searchable Networks Generalizing Kronecker Graphs in orer to Moel Searchable Networks Elizabeth Boine, Babak Hassibi, Aam Wierman California Institute of Technology Pasaena, CA 925 Email: {eaboine, hassibi, aamw}@caltecheu

More information

EVALUATING HIGHER DERIVATIVE TENSORS BY FORWARD PROPAGATION OF UNIVARIATE TAYLOR SERIES

EVALUATING HIGHER DERIVATIVE TENSORS BY FORWARD PROPAGATION OF UNIVARIATE TAYLOR SERIES MATHEMATICS OF COMPUTATION Volume 69, Number 231, Pages 1117 1130 S 0025-5718(00)01120-0 Article electronically publishe on February 17, 2000 EVALUATING HIGHER DERIVATIVE TENSORS BY FORWARD PROPAGATION

More information

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs Lectures - Week 10 Introuction to Orinary Differential Equations (ODES) First Orer Linear ODEs When stuying ODEs we are consiering functions of one inepenent variable, e.g., f(x), where x is the inepenent

More information

Diagonalization of Matrices Dr. E. Jacobs

Diagonalization of Matrices Dr. E. Jacobs Diagonalization of Matrices Dr. E. Jacobs One of the very interesting lessons in this course is how certain algebraic techniques can be use to solve ifferential equations. The purpose of these notes is

More information

Expected Value of Partial Perfect Information

Expected Value of Partial Perfect Information Expecte Value of Partial Perfect Information Mike Giles 1, Takashi Goa 2, Howar Thom 3 Wei Fang 1, Zhenru Wang 1 1 Mathematical Institute, University of Oxfor 2 School of Engineering, University of Tokyo

More information

Agmon Kolmogorov Inequalities on l 2 (Z d )

Agmon Kolmogorov Inequalities on l 2 (Z d ) Journal of Mathematics Research; Vol. 6, No. ; 04 ISSN 96-9795 E-ISSN 96-9809 Publishe by Canaian Center of Science an Eucation Agmon Kolmogorov Inequalities on l (Z ) Arman Sahovic Mathematics Department,

More information

The new concepts of measurement error s regularities and effect characteristics

The new concepts of measurement error s regularities and effect characteristics The new concepts of measurement error s regularities an effect characteristics Ye Xiaoming[1,] Liu Haibo [3,,] Ling Mo[3] Xiao Xuebin [5] [1] School of Geoesy an Geomatics, Wuhan University, Wuhan, Hubei,

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri ITA, UC San Diego, 9500 Gilman Drive, La Jolla, CA Sham M. Kakae Karen Livescu Karthik Sriharan Toyota Technological Institute at Chicago, 6045 S. Kenwoo Ave., Chicago, IL kamalika@soe.ucs.eu

More information

STATISTICAL LIKELIHOOD REPRESENTATIONS OF PRIOR KNOWLEDGE IN MACHINE LEARNING

STATISTICAL LIKELIHOOD REPRESENTATIONS OF PRIOR KNOWLEDGE IN MACHINE LEARNING STATISTICAL LIKELIHOOD REPRESENTATIONS OF PRIOR KNOWLEDGE IN MACHINE LEARNING Mark A. Kon Department of Mathematics an Statistics Boston University Boston, MA 02215 email: mkon@bu.eu Anrzej Przybyszewski

More information

05 The Continuum Limit and the Wave Equation

05 The Continuum Limit and the Wave Equation Utah State University DigitalCommons@USU Founations of Wave Phenomena Physics, Department of 1-1-2004 05 The Continuum Limit an the Wave Equation Charles G. Torre Department of Physics, Utah State University,

More information

Jointly continuous distributions and the multivariate Normal

Jointly continuous distributions and the multivariate Normal Jointly continuous istributions an the multivariate Normal Márton alázs an álint Tóth October 3, 04 This little write-up is part of important founations of probability that were left out of the unit Probability

More information

TEMPORAL AND TIME-FREQUENCY CORRELATION-BASED BLIND SOURCE SEPARATION METHODS. Yannick DEVILLE

TEMPORAL AND TIME-FREQUENCY CORRELATION-BASED BLIND SOURCE SEPARATION METHODS. Yannick DEVILLE TEMPORAL AND TIME-FREQUENCY CORRELATION-BASED BLIND SOURCE SEPARATION METHODS Yannick DEVILLE Université Paul Sabatier Laboratoire Acoustique, Métrologie, Instrumentation Bât. 3RB2, 8 Route e Narbonne,

More information

THE EFFICIENCIES OF THE SPATIAL MEDIAN AND SPATIAL SIGN COVARIANCE MATRIX FOR ELLIPTICALLY SYMMETRIC DISTRIBUTIONS

THE EFFICIENCIES OF THE SPATIAL MEDIAN AND SPATIAL SIGN COVARIANCE MATRIX FOR ELLIPTICALLY SYMMETRIC DISTRIBUTIONS THE EFFICIENCIES OF THE SPATIAL MEDIAN AND SPATIAL SIGN COVARIANCE MATRIX FOR ELLIPTICALLY SYMMETRIC DISTRIBUTIONS BY ANDREW F. MAGYAR A issertation submitte to the Grauate School New Brunswick Rutgers,

More information

WUCHEN LI AND STANLEY OSHER

WUCHEN LI AND STANLEY OSHER CONSTRAINED DYNAMICAL OPTIMAL TRANSPORT AND ITS LAGRANGIAN FORMULATION WUCHEN LI AND STANLEY OSHER Abstract. We propose ynamical optimal transport (OT) problems constraine in a parameterize probability

More information

Physics 5153 Classical Mechanics. The Virial Theorem and The Poisson Bracket-1

Physics 5153 Classical Mechanics. The Virial Theorem and The Poisson Bracket-1 Physics 5153 Classical Mechanics The Virial Theorem an The Poisson Bracket 1 Introuction In this lecture we will consier two applications of the Hamiltonian. The first, the Virial Theorem, applies to systems

More information

EIGEN-ANALYSIS OF KERNEL OPERATORS FOR NONLINEAR DIMENSION REDUCTION AND DISCRIMINATION

EIGEN-ANALYSIS OF KERNEL OPERATORS FOR NONLINEAR DIMENSION REDUCTION AND DISCRIMINATION EIGEN-ANALYSIS OF KERNEL OPERATORS FOR NONLINEAR DIMENSION REDUCTION AND DISCRIMINATION DISSERTATION Presente in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Grauate

More information

Inter-domain Gaussian Processes for Sparse Inference using Inducing Features

Inter-domain Gaussian Processes for Sparse Inference using Inducing Features Inter-omain Gaussian Processes for Sparse Inference using Inucing Features Miguel Lázaro-Greilla an Aníbal R. Figueiras-Vial Dep. Signal Processing & Communications Universia Carlos III e Mari, SPAIN {miguel,arfv}@tsc.uc3m.es

More information

arxiv: v1 [hep-lat] 19 Nov 2013

arxiv: v1 [hep-lat] 19 Nov 2013 HU-EP-13/69 SFB/CPP-13-98 DESY 13-225 Applicability of Quasi-Monte Carlo for lattice systems arxiv:1311.4726v1 [hep-lat] 19 ov 2013, a,b Tobias Hartung, c Karl Jansen, b Hernan Leovey, Anreas Griewank

More information

Calculus of Variations

Calculus of Variations 16.323 Lecture 5 Calculus of Variations Calculus of Variations Most books cover this material well, but Kirk Chapter 4 oes a particularly nice job. x(t) x* x*+ αδx (1) x*- αδx (1) αδx (1) αδx (1) t f t

More information

Improving Estimation Accuracy in Nonrandomized Response Questioning Methods by Multiple Answers

Improving Estimation Accuracy in Nonrandomized Response Questioning Methods by Multiple Answers International Journal of Statistics an Probability; Vol 6, No 5; September 207 ISSN 927-7032 E-ISSN 927-7040 Publishe by Canaian Center of Science an Eucation Improving Estimation Accuracy in Nonranomize

More information

Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs

Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs Ashish Goel Michael Kapralov Sanjeev Khanna Abstract We consier the well-stuie problem of fining a perfect matching in -regular bipartite

More information

Modelling and simulation of dependence structures in nonlife insurance with Bernstein copulas

Modelling and simulation of dependence structures in nonlife insurance with Bernstein copulas Moelling an simulation of epenence structures in nonlife insurance with Bernstein copulas Prof. Dr. Dietmar Pfeifer Dept. of Mathematics, University of Olenburg an AON Benfiel, Hamburg Dr. Doreen Straßburger

More information

Necessary and Sufficient Conditions for Sketched Subspace Clustering

Necessary and Sufficient Conditions for Sketched Subspace Clustering Necessary an Sufficient Conitions for Sketche Subspace Clustering Daniel Pimentel-Alarcón, Laura Balzano 2, Robert Nowak University of Wisconsin-Maison, 2 University of Michigan-Ann Arbor Abstract This

More information

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation Tutorial on Maximum Likelyhoo Estimation: Parametric Density Estimation Suhir B Kylasa 03/13/2014 1 Motivation Suppose one wishes to etermine just how biase an unfair coin is. Call the probability of tossing

More information

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments 2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor

More information

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21 Large amping in a structural material may be either esirable or unesirable, epening on the engineering application at han. For example, amping is a esirable property to the esigner concerne with limiting

More information

θ x = f ( x,t) could be written as

θ x = f ( x,t) could be written as 9. Higher orer PDEs as systems of first-orer PDEs. Hyperbolic systems. For PDEs, as for ODEs, we may reuce the orer by efining new epenent variables. For example, in the case of the wave equation, (1)

More information

Homework 3 - Solutions

Homework 3 - Solutions Homework 3 - Solutions The Transpose an Partial Transpose. 1 Let { 1, 2,, } be an orthonormal basis for C. The transpose map efine with respect to this basis is a superoperator Γ that acts on an operator

More information

Equilibrium in Queues Under Unknown Service Times and Service Value

Equilibrium in Queues Under Unknown Service Times and Service Value University of Pennsylvania ScholarlyCommons Finance Papers Wharton Faculty Research 1-2014 Equilibrium in Queues Uner Unknown Service Times an Service Value Laurens Debo Senthil K. Veeraraghavan University

More information

7.1 Support Vector Machine

7.1 Support Vector Machine 67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to

More information

PROBLEMS of estimating relative or absolute camera pose

PROBLEMS of estimating relative or absolute camera pose IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 7, JULY 2012 1381 Polynomial Eigenvalue Solutions to Minimal Problems in Computer Vision Zuzana Kukelova, Member, IEEE, Martin

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:

More information

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent

More information

JUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson

JUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson JUST THE MATHS UNIT NUMBER 10.2 DIFFERENTIATION 2 (Rates of change) by A.J.Hobson 10.2.1 Introuction 10.2.2 Average rates of change 10.2.3 Instantaneous rates of change 10.2.4 Derivatives 10.2.5 Exercises

More information

Technion - Computer Science Department - M.Sc. Thesis MSC Constrained Codes for Two-Dimensional Channels.

Technion - Computer Science Department - M.Sc. Thesis MSC Constrained Codes for Two-Dimensional Channels. Technion - Computer Science Department - M.Sc. Thesis MSC-2006- - 2006 Constraine Coes for Two-Dimensional Channels Keren Censor Technion - Computer Science Department - M.Sc. Thesis MSC-2006- - 2006 Technion

More information

Introduction to the Vlasov-Poisson system

Introduction to the Vlasov-Poisson system Introuction to the Vlasov-Poisson system Simone Calogero 1 The Vlasov equation Consier a particle with mass m > 0. Let x(t) R 3 enote the position of the particle at time t R an v(t) = ẋ(t) = x(t)/t its

More information

arxiv: v4 [cs.ds] 7 Mar 2014

arxiv: v4 [cs.ds] 7 Mar 2014 Analysis of Agglomerative Clustering Marcel R. Ackermann Johannes Blömer Daniel Kuntze Christian Sohler arxiv:101.697v [cs.ds] 7 Mar 01 Abstract The iameter k-clustering problem is the problem of partitioning

More information

Stable and compact finite difference schemes

Stable and compact finite difference schemes Center for Turbulence Research Annual Research Briefs 2006 2 Stable an compact finite ifference schemes By K. Mattsson, M. Svär AND M. Shoeybi. Motivation an objectives Compact secon erivatives have long

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 12 EFFICIENT LEARNING So far, our focus has been on moels of learning an basic algorithms for those moels. We have not place much emphasis on how to learn quickly.

More information

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION The Annals of Statistics 1997, Vol. 25, No. 6, 2313 2327 LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION By Eva Riccomagno, 1 Rainer Schwabe 2 an Henry P. Wynn 1 University of Warwick, Technische

More information

Rank, Trace, Determinant, Transpose an Inverse of a Matrix Let A be an n n square matrix: A = a11 a1 a1n a1 a an a n1 a n a nn nn where is the jth col

Rank, Trace, Determinant, Transpose an Inverse of a Matrix Let A be an n n square matrix: A = a11 a1 a1n a1 a an a n1 a n a nn nn where is the jth col Review of Linear Algebra { E18 Hanout Vectors an Their Inner Proucts Let X an Y be two vectors: an Their inner prouct is ene as X =[x1; ;x n ] T Y =[y1; ;y n ] T (X; Y ) = X T Y = x k y k k=1 where T an

More information

Separation of Variables

Separation of Variables Physics 342 Lecture 1 Separation of Variables Lecture 1 Physics 342 Quantum Mechanics I Monay, January 25th, 2010 There are three basic mathematical tools we nee, an then we can begin working on the physical

More information

CMA-ES with Optimal Covariance Update and Storage Complexity

CMA-ES with Optimal Covariance Update and Storage Complexity CMA-ES with Optimal Covariance Upate an Storage Complexity Oswin Krause Dept. of Computer Science University of Copenhagen Copenhagen, Denmark oswin.krause@i.ku.k Díac R. Arbonès Dept. of Computer Science

More information

LECTURE NOTES ON DVORETZKY S THEOREM

LECTURE NOTES ON DVORETZKY S THEOREM LECTURE NOTES ON DVORETZKY S THEOREM STEVEN HEILMAN Abstract. We present the first half of the paper [S]. In particular, the results below, unless otherwise state, shoul be attribute to G. Schechtman.

More information

Introduction to Markov Processes

Introduction to Markov Processes Introuction to Markov Processes Connexions moule m44014 Zzis law Gustav) Meglicki, Jr Office of the VP for Information Technology Iniana University RCS: Section-2.tex,v 1.24 2012/12/21 18:03:08 gustav

More information

Track Initialization from Incomplete Measurements

Track Initialization from Incomplete Measurements Track Initialiation from Incomplete Measurements Christian R. Berger, Martina Daun an Wolfgang Koch Department of Electrical an Computer Engineering, University of Connecticut, Storrs, Connecticut 6269,

More information

Estimation of the Maximum Domination Value in Multi-Dimensional Data Sets

Estimation of the Maximum Domination Value in Multi-Dimensional Data Sets Proceeings of the 4th East-European Conference on Avances in Databases an Information Systems ADBIS) 200 Estimation of the Maximum Domination Value in Multi-Dimensional Data Sets Eleftherios Tiakas, Apostolos.

More information

Robustness and Perturbations of Minimal Bases

Robustness and Perturbations of Minimal Bases Robustness an Perturbations of Minimal Bases Paul Van Dooren an Froilán M Dopico December 9, 2016 Abstract Polynomial minimal bases of rational vector subspaces are a classical concept that plays an important

More information

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II) Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture

More information

TMA 4195 Matematisk modellering Exam Tuesday December 16, :00 13:00 Problems and solution with additional comments

TMA 4195 Matematisk modellering Exam Tuesday December 16, :00 13:00 Problems and solution with additional comments Problem F U L W D g m 3 2 s 2 0 0 0 0 2 kg 0 0 0 0 0 0 Table : Dimension matrix TMA 495 Matematisk moellering Exam Tuesay December 6, 2008 09:00 3:00 Problems an solution with aitional comments The necessary

More information

SYNCHRONOUS SEQUENTIAL CIRCUITS

SYNCHRONOUS SEQUENTIAL CIRCUITS CHAPTER SYNCHRONOUS SEUENTIAL CIRCUITS Registers an counters, two very common synchronous sequential circuits, are introuce in this chapter. Register is a igital circuit for storing information. Contents

More information

Topic Modeling: Beyond Bag-of-Words

Topic Modeling: Beyond Bag-of-Words Hanna M. Wallach Cavenish Laboratory, University of Cambrige, Cambrige CB3 0HE, UK hmw26@cam.ac.u Abstract Some moels of textual corpora employ text generation methos involving n-gram statistics, while

More information

Local Linear ICA for Mutual Information Estimation in Feature Selection

Local Linear ICA for Mutual Information Estimation in Feature Selection Local Linear ICA for Mutual Information Estimation in Feature Selection Tian Lan, Deniz Erogmus Department of Biomeical Engineering, OGI, Oregon Health & Science University, Portlan, Oregon, USA E-mail:

More information

Linear First-Order Equations

Linear First-Order Equations 5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)

More information

Sparse Reconstruction of Systems of Ordinary Differential Equations

Sparse Reconstruction of Systems of Ordinary Differential Equations Sparse Reconstruction of Systems of Orinary Differential Equations Manuel Mai a, Mark D. Shattuck b,c, Corey S. O Hern c,a,,e, a Department of Physics, Yale University, New Haven, Connecticut 06520, USA

More information

Interpretation of the Multi-Stage Nested Wiener Filter in the Krylov Subspace Framework

Interpretation of the Multi-Stage Nested Wiener Filter in the Krylov Subspace Framework Interpretation of the Multi-Stage Neste Wiener Filter in the Krylov Subspace Framework M. Joham an M. D. Zoltowski School of Electrical Engineering, Purue University, West Lafayette, IN 47907-285 e-mail:

More information

A Randomized Approximate Nearest Neighbors Algorithm - a short version

A Randomized Approximate Nearest Neighbors Algorithm - a short version We present a ranomize algorithm for the approximate nearest neighbor problem in - imensional Eucliean space. Given N points {x } in R, the algorithm attempts to fin k nearest neighbors for each of x, where

More information

Applied Statistics. Multivariate Analysis - part II. Troels C. Petersen (NBI) Statistics is merely a quantization of common sense 1

Applied Statistics. Multivariate Analysis - part II. Troels C. Petersen (NBI) Statistics is merely a quantization of common sense 1 Applied Statistics Multivariate Analysis - part II Troels C. Petersen (NBI) Statistics is merely a quantization of common sense 1 Fisher Discriminant You want to separate two types/classes (A and B) of

More information

SYSTEMS OF DIFFERENTIAL EQUATIONS, EULER S FORMULA. where L is some constant, usually called the Lipschitz constant. An example is

SYSTEMS OF DIFFERENTIAL EQUATIONS, EULER S FORMULA. where L is some constant, usually called the Lipschitz constant. An example is SYSTEMS OF DIFFERENTIAL EQUATIONS, EULER S FORMULA. Uniqueness for solutions of ifferential equations. We consier the system of ifferential equations given by x = v( x), () t with a given initial conition

More information

State observers and recursive filters in classical feedback control theory

State observers and recursive filters in classical feedback control theory State observers an recursive filters in classical feeback control theory State-feeback control example: secon-orer system Consier the riven secon-orer system q q q u x q x q x x x x Here u coul represent

More information

Logarithmic spurious regressions

Logarithmic spurious regressions Logarithmic spurious regressions Robert M. e Jong Michigan State University February 5, 22 Abstract Spurious regressions, i.e. regressions in which an integrate process is regresse on another integrate

More information