Some Applications of Stochastic. Gradient Processes
|
|
- Bethanie McDaniel
- 5 years ago
- Views:
Transcription
1 Applied Mathematical Sciences, Vol. 2, 2008, no. 25, Some Applications of Stochastic Gradient Processes A. Bennar 1, A. Bouamaine 2 and A. Namir 1 1 Département de Mathématiques et Informatique Faculté des sciences Ben M sik Université Hassan II Mohammedia, B.P Casablanca, Maroc 2 Ecole Nationale Supérieure d éléctricité etdemécanique Equipe de Recherche : Architecture des Systèmes Université Hassan II, Ain Chok, B.P Casablanca, Maroc Abstract We consider a stochastic gradient process, which is used for the estimation of a conditional expectation : X n+1 = X n a n x φ(v n,x n )(φ(v n,x n ) U n ). We give one theorem of almost sure convergence and one theorem of mean quadratic convergence. Several applications are given : linear estimation of a conditional expectation, sequential estimation of law mixture parameters in classification, estimation of an observable function in random points, estimation of a function h(x) = E[Z(x)], estimation of a linear regression parameters, estimation of baysian discriminant function. Mathematics Subject Classification: Primary: 62; Secondary: L20 Keywords: Stochastic approximation, conditional expectation, stochastic gradient
2 1220 A. Bennar, A. Bouamaine and A. Namir 1. Introduction We consider a random vector X n in R p defined by : X n+1 = X n a n x φ(v n,x n )(φ(v n,x n ) U n ) with (a n ) is a sequence of positive real numbers ; (U 1,V 1 ), (U 2,V 2 ),..., (U n,v n ) is a sample of independent random variable couples with the same probability law that (U, V ). φ(.,.) is a real known measurable function in R k R p. In the following,.,. and. are respectively the usual inner product and norm in R k ; A denotes the transposed matrix of A, λ min (B) the smallest eigenvalue of B ; the abbreviation a.s. means almost surely and q.m. means quadratic mean. 2. Convergence 2.1 Almost Sure Convergence Let s make the following hypotheses : (H 1 ) a n > 0, a 2 n < (H 1) a n > 0, 1 a n =, 1 a 2 n < 1 (H 2 ) there exists a and b such thats, for all θ =(θ 1,θ 2,..., θ p ) R p, [ ] φ(v,x) Var (φ(v,x) U) <ag(x)+b, for all i =1, 2,..., p. x i (H 3 ) there exists K>0such that, for all x =(x 1,x 2,..., x p ), 2 g(x) x i x j < K, for i, j =1, 2,..., p. (H 4 ) θ is a local minimum of g : α>0 : (x θ, x θ <α) (g(θ ) <g(x))
3 Applications of stochastic gradient processes 1221 (H 5 ) θ is the unique stationary point of g : Theorem x R p, (x θ ) ( x g(x) 0) Under hypotheses H 1,H 2,H 3,H 4,H 5, we have : X n a.s. θor X n a.s. + Proof See [3] 2.2 Quadratic Mean Convergence Let s make the following hypotheses : (H 8 ) φ(x, θ), x φ(x, θ) are uniformly bounded in x and θ. (H 9 ) It exists two real positives functions h and h defined in R p such that : θ, θ R p, x R q, φ(x, θ) φ(x, θ ) h(x) θ θ θ φ(x, θ) θ φ(x, θ ) h (x) θ θ E[h(X)] < ; E[h (X)] < (H 10 ) Y is a real random bounded variable. Theorem Under hypotheses H 1,H 3,H 8,H 9,H 10, we have : Proof θ g(θ n ) a.s. 0 and θ g(θ n ) q.m. 0 See [3] 3. Applications 3.1 Séquential Estimation of a Conditional Expectation by the linear model
4 1222 A. Bennar, A. Bouamaine and A. Namir Let ρ 1,ρ 2,..., ρ p p functions of q real variables, measurable, known. Let s put ρ =(ρ 1,ρ 2,..., ρ p ) ] to appraise θ that minimizes E [(E[U/V ] x ρ(v )) 2, We consider the stochastic approximation process (X n )in R p by : X n+1 = X n a n ρ(v n )(ρ (V n )X n U n ) Where (U 1,V 1 ), (U 2,V 2 ),..., (U n,v n ) is a sample of (U, V )formed of independent random variables and distributed identically. Let s make hypotheses (H 6 ) ρ 1 (V ),ρ 2 (V ),..., ρ p (V ) are linearly independent. (H 7 ) Moments of order 4 of the vector (ρ 1 (V ),ρ 2 (V ),..., ρ p (V ),U) exists. (H 8 ) X 1 is an random variable such that E[ X 1 2 ] < Corollary Under hypotheses H 1,H 6,H 7,H 8, we have : X n a.s. θ Proof Let φ the real function of R q R p defined by : p φ(v,x) =x ρ(v )= x j ρ j (V ) φ(v,x) For j =1, 2,..., p, = ρ j (V ), we have : x φ(v,x)=ρ(v ) x j Let : A = E[ρ(V )ρ (V )] Under H 7, the matrix A is symmetrical definite positive, therefore inversible. Then : θ is solution unique of the equation j=1 x g(x) =2E[ρ(V )(ρ (V )x U)] = 0 We have : θ = A 1 E[ρ(V )U]
5 Applications of stochastic gradient processes 1223 Let s prove that the hypothesis H 2 is verified. We have g(x) =E[U 2 ]+ x 2 2 <x,θ > A = E[U 2 ]+ x θ 2 A θ 2 A c x θ 2 A + d (c = λ min(a) and d = E[U 2 ] θ 2 A) 1 2 c x 2 c x 2 + d e x 2 + f (e = c 2 and f = d c x 2 ) Therefore, for i =1, 2,..., p, we have: [ ] φ(v,x) Var (φ(v,x) U) x i E[ρ 2 i (V )(x ρ(v ) U) 2 ] [ ] = Var ρ i (V )(x ρ(v ) U) a x 2 + b (a =2E[ρ 2 i (V ) ρ(v ) 2 ],b=2e[ρ 2 i (V )U 2 ]) Ag(x)+B (A = 1 e,b= b af e ) Let s prove that the hypothesis H 3 is verified. For i =1, 2,..., p, We have g(x) =2E[(x ρ(v ) U)ρ i (V )] x i Therefore : for i, j =1, 2,..., p, we have 2 g(x) =2E[ρ j (V )ρ i (V )], that doesn t depend of x. x i x j Hypotheses of the theorem 2.1 are verified, therefore : X n a.s. θ or X n a.s. + Let s prove that that we can not have X n a.s. + Indeed : as a n x g(x n ) 2 < a.s. ( see [3] Lemma 2.1) and 1 n a n =+, there exists an sub-sequence of integers (n l ) such that x g(x nl ) a.s. 0 Besides : x g(x) =2E[ρ(V )(ρ (V )x U)] = E[ρ(V )(ρ (V )] = 2A(x θ ) Therefore : x g(x n ) 2 4λ 2 min(a) X n θ 2 (λ min(a) > 0) Therefore : If X n p.s. + then x g(x n ) a.s. +. What is absurd. 3.2 Sequential Estimation of parameters of a law mixture in
6 1224 A. Bennar, A. Bouamaine and A. Namir classification Let X 1,X 2,..., X n,... a sample of X formed of random variables independent, distributed identically of law µ and to values in R q definied on a probability space (Ω, A, P). We suppose that µ is a mixture of laws : µ = r j=1 p jµ j with : r j, p j 0, p j =1et µ j is a law of probability on R q. j=1 Let F (resp. F j ) the function of distribution of µ (resp. µ j ) We have : x R q, F(x) = r j=1 p jf j (x) For j =1, 2,..., r, We suppose that p j depends a parameter β j and F j of a multidimentional parameter m j. Let θ =(β 1,β 2,..., β r,m 1,m 2,..., m r ) Let p the dimension of θ and let Φ the real function of R p R q, measurable such that : r Φ(x, θ) = π(β j )Ψ(m j,x) We suppose that functions π et Ψ are known verifying : r j, π(β j ) 0, π(β j )=1 j=1 j=1 j, Ψ(m j,.) is a function of distribution in R q. We wish to determine the parameter θ of R p such that Φ(x, θ) approach F (x) in the least square sense. ] Let f(θ) =E [(Φ(X, θ) F (X)) 2 We look for θ such that the function f is minimal for θ = θ. Let Z a random variable in R q of law µ and Y the indicatory function defined by : 1 if x D z Y = I Z (X) = 0 else
7 Applications of stochastic gradient processes 1225 with D z = {x R q : x z} and : x =(x 1,..., x q ) z =(z 1,..., z q ) x z j =1, 2,...q, x j z j We have : E[I Z (x)] = P (Z x) =F (x) E[Y/X]=E[I Z (X)/X] =F (X) ] Therefore f(θ) =E [(E[Y/X] Φ(X, θ)) 2 Let g(θ) =E[(Y Φ(X, θ)) 2 ]=E[(I Z (X) Φ(X, θ)) 2 ] The problem of estimation of θ that minimizes f becomes to look for θ such that g is minimal for θ = θ We have : g(θ) = 2E [ ( )] θ Φ(X, θ) Φ(X, θ) I Z (X) For estimate θ by séquential schem, we define( the process (Θ n )in R p ) by : Θ n+1 =Θ n a n θ Φ(X n, Θ n ) Φ(X n, Θ n ) I Zn (X n ) with (X 1,Z 1 ), (X 2,Z 2 ),..., (X n,z n ) a sample of (Z, X) formed of random independent variables, distributed identically and (a n ) is asequence of positive real numbers. Corollary a.s. Under H 1,H 2,H 3,H 4,H 5, we have Θ n θ or Θ n a.s. + Proof It is a consequence of the previous theorem. 3.3 Estimation of an observable function in random points. Let h a real function of m real variables. Let an random variable X to values in R m. We suppose that we can observe the real random variable h(x). We have E[h(X)/X] =h(x). We can approach h(x) by a linear combinaison of functions Υ i (X),i=1, 2,..., p, and estimate so the function h by using the general method of the gradient, with Y = h(x). 3.4 Estimation of an function h(x) =E[Z(x)]. Let a family of real random variable {Z(x),x R m }; Let E[Z(x)] = h(x) and let an random random variable X to values in R m. We suppose that we can observe the real random variable Z(X). We have E[Z(X)/X] =h(x). We can approach h(x) by a linear
8 1226 A. Bennar, A. Bouamaine and A. Namir combinaison of functions Υ i (X),i=1, 2,..., p, and estimate so the function h by using the general method of the gradient, with Y = Z(X). 3.4 Estimation of a linear regression parameters. The most direct application of the stochastic gradient method is the linear regression. Y is the explained random variable, X 1,X 2,..., X m are the explanatory random variables, that constitute the variable X R m.we approach E[Y/X 1,X 2,..., X m ] by a linear combinaison of Υ i (X), i =1, 2,..., p. 3.5 Estimation of baysian discriminant function. We distinguish r classes C 1,C 2,..., C r in a set of individuals. To classe a new individual, we measure m variables X 1,X 2,..., X m, that constitute the variable X R m. The utilization of the ordering baysian rule requires the knowledge of probabilities to posteriori P (C i /X). References [1] A. BENNAR, Approximation stochastique, Convergence dans le cas de plusieurs solutions et étude de modèles de corrélations. Thèse de doctorat de 3ème cycle, Université de Nancy I, (1985). [2] A. Bennar, J.M. Monnez, Almost sure convergence of a stochastic Approximation process in a convex set. International Journal of Applied Mathematics, vol. 20, n 5, pp : (2007). [3] A. Bennar, A. Bouamaine, A. Namir, Almost sure Convergence and in quadratic mean of the gradient stochastic process for the sequential estimation of a conditional expectation. Applied Mathematical Sciences, vol. 2, no. 8, pp : , (2008). [4] A. BOUAMAINE, Méthodes d approximation stochastique en Analyse des Données. Thèse de doctorat d etat, Université Mohamed V, (1996).
9 Applications of stochastic gradient processes 1227 [5] E.M. BRAVERMAN, L.T. ROZONOER, Convergence of trandom process in learning machines theory. Part I and II. Automation and Remote Control, vol. 30, pp : and pp : , (1969). [6] J. DIPPON, Globally convergent stochastic optimization with optimal asymptotic distribution. J. Appl. Proba. 35, pp : , (1998). [7] D.A. FREEDMAN, L.E. DUBINS, A sharper from of the Borel-Cantelli lemma and the strong law. A.M.S., vol. 36, pp : [8] J.M. MONNEZ, Etude d un processus general multidimensionnel d approximation stochastique sous contraintes convexes. Applications a l estimation statistique. Thèse de doctorat d Etat es Sciences Mathématiques, Université de Nancy I, (1982). [9] J.M. MONNEZ, Almost Sure Convergence of Stochastic Gradient Process with Matrix Step Sizes. Statistics and Probability Letters, 76, pp : , (2006). [10] J.P. PARISOT,Optimisation stochastique : le processus de KIEFFER-WOLFOFITZ, essai de synthese et quelques complements. Thèse de Doctorat de troisième cycle, Université de Nancy 1 (1981). [11] H. Robbins and D. Siegmund,A convergence theorem for nonnegative almost supermartingales and some applications. Optimizing methods in Statistics, J.S. Rustagi ed., Academic Press, New-York, pp : , (1971). [12] H. ROBBINS, S. MONRO, A stochastic approximation method. A.M.S., vol 22, pp : , (1951). [13] J.H. VENTER, On convergence of the Kiefer-Wolfowitz process. Approximation procedure. A.M.S., vol. 38, pp : [14] J.C. SPALL, Introduction to Stochastic Search and Optimizing. Wiley, Hoboken, New Jersey, (2003).
10 1228 A. Bennar, A. Bouamaine and A. Namir Received: November 19, 2007
A NON-PARAMETRIC TEST FOR NON-INDEPENDENT NOISES AGAINST A BILINEAR DEPENDENCE
REVSTAT Statistical Journal Volume 3, Number, November 5, 155 17 A NON-PARAMETRIC TEST FOR NON-INDEPENDENT NOISES AGAINST A BILINEAR DEPENDENCE Authors: E. Gonçalves Departamento de Matemática, Universidade
More informationInvertibility of discrete distributed systems: A state space approach
Invertibility of discrete distributed systems: A state space approach J Karrakchou Laboratoire d étude et de Recherche en Mathématiques Appliquées Ecole Mohammadia d Ingénieurs, BP: 765, Rabat-Agdal, Maroc
More informationPseudo-monotonicity and degenerate elliptic operators of second order
2002-Fez conference on Partial Differential Equations, Electronic Journal of Differential Equations, Conference 09, 2002, pp 9 24. http://ejde.math.swt.edu or http://ejde.math.unt.edu ftp ejde.math.swt.edu
More informationSEPARABILITY AND COMPLETENESS FOR THE WASSERSTEIN DISTANCE
SEPARABILITY AND COMPLETENESS FOR THE WASSERSTEIN DISTANCE FRANÇOIS BOLLEY Abstract. In this note we prove in an elementary way that the Wasserstein distances, which play a basic role in optimal transportation
More informationNOTE ON THE NODAL LINE OF THE P-LAPLACIAN. 1. Introduction In this paper we consider the nonlinear elliptic boundary-value problem
2005-Oujda International Conference on Nonlinear Analysis. Electronic Journal of Differential Equations, Conference 14, 2006, pp. 155 162. ISSN: 1072-6691. URL: http://ejde.math.txstate.edu or http://ejde.math.unt.edu
More informationLecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales.
Lecture 2 1 Martingales We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales. 1.1 Doob s inequality We have the following maximal
More informationDifferential Equations Preliminary Examination
Differential Equations Preliminary Examination Department of Mathematics University of Utah Salt Lake City, Utah 84112 August 2007 Instructions This examination consists of two parts, called Part A and
More informationStructural and Multidisciplinary Optimization. P. Duysinx and P. Tossings
Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be
More informationJournal of Inequalities in Pure and Applied Mathematics
Journal of Inequalities in Pure and Applied Mathematics http://jipam.vu.edu.au/ Volume 3, Issue 3, Article 46, 2002 WEAK PERIODIC SOLUTIONS OF SOME QUASILINEAR PARABOLIC EQUATIONS WITH DATA MEASURES N.
More informationOn the Study of a Fading-Memory System
Applied Mathematical Sciences, Vol. 1, 27, no. 41, 223-242 On the Study of a Fading-Memory System M. Laklalech 1, O. Idrissi Kacemi, M. Rachik 2, A. Namir Département de Mathématiques et Informatique Faculté
More informationTHEOREMS, ETC., FOR MATH 515
THEOREMS, ETC., FOR MATH 515 Proposition 1 (=comment on page 17). If A is an algebra, then any finite union or finite intersection of sets in A is also in A. Proposition 2 (=Proposition 1.1). For every
More informationPre-Hilbert Absolute-Valued Algebras Satisfying (x, x 2, x) = (x 2, y, x 2 ) = 0
International Journal of Algebra, Vol. 10, 2016, no. 9, 437-450 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ija.2016.6743 Pre-Hilbert Absolute-Valued Algebras Satisfying (x, x 2, x = (x 2,
More informationA Sparsity Preserving Stochastic Gradient Method for Composite Optimization
A Sparsity Preserving Stochastic Gradient Method for Composite Optimization Qihang Lin Xi Chen Javier Peña April 3, 11 Abstract We propose new stochastic gradient algorithms for solving convex composite
More informationA SOLUTION METHOD OF THE SIGNORINI PROBLEM
2005-Oujda International Conference on Nonlinear Analysis. Electronic Journal of Differential Equations, Conference 14, 2006, pp. 1 7. ISSN: 1072-6691. URL: http://ejde.math.txstate.edu or http://ejde.math.unt.edu
More informationMET Workshop: Exercises
MET Workshop: Exercises Alex Blumenthal and Anthony Quas May 7, 206 Notation. R d is endowed with the standard inner product (, ) and Euclidean norm. M d d (R) denotes the space of n n real matrices. When
More informationLecture 3: Expected Value. These integrals are taken over all of Ω. If we wish to integrate over a measurable subset A Ω, we will write
Lecture 3: Expected Value 1.) Definitions. If X 0 is a random variable on (Ω, F, P), then we define its expected value to be EX = XdP. Notice that this quantity may be. For general X, we say that EX exists
More informationBinary Classification / Perceptron
Binary Classification / Perceptron Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Supervised Learning Input: x 1, y 1,, (x n, y n ) x i is the i th data
More informationP (A G) dp G P (A G)
First homework assignment. Due at 12:15 on 22 September 2016. Homework 1. We roll two dices. X is the result of one of them and Z the sum of the results. Find E [X Z. Homework 2. Let X be a r.v.. Assume
More informationSOME CONVERSE LIMIT THEOREMS FOR EXCHANGEABLE BOOTSTRAPS
SOME CONVERSE LIMIT THEOREMS OR EXCHANGEABLE BOOTSTRAPS Jon A. Wellner University of Washington The bootstrap Glivenko-Cantelli and bootstrap Donsker theorems of Giné and Zinn (990) contain both necessary
More informationA Perturbed Gradient Algorithm in Hilbert spaces
A Perturbed Gradient Algorithm in Hilbert spaces Kengy Barty, Jean-Sébastien Roy, Cyrille Strugarek May 19, 2005 Abstract We propose a perturbed gradient algorithm with stochastic noises to solve a general
More informationDA Freedman Notes on the MLE Fall 2003
DA Freedman Notes on the MLE Fall 2003 The object here is to provide a sketch of the theory of the MLE. Rigorous presentations can be found in the references cited below. Calculus. Let f be a smooth, scalar
More informationMachine Learning. Support Vector Machines. Fabio Vandin November 20, 2017
Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training
More informationMixed exterior Laplace s problem
Mixed exterior Laplace s problem Chérif Amrouche, Florian Bonzom Laboratoire de mathématiques appliquées, CNRS UMR 5142, Université de Pau et des Pays de l Adour, IPRA, Avenue de l Université, 64000 Pau
More informationLecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning
Lecture 0 Neural networks and optimization Machine Learning and Data Mining November 2009 UBC Gradient Searching for a good solution can be interpreted as looking for a minimum of some error (loss) function
More informationA CHARACTERIZATION OF STRICT LOCAL MINIMIZERS OF ORDER ONE FOR STATIC MINMAX PROBLEMS IN THE PARAMETRIC CONSTRAINT CASE
Journal of Applied Analysis Vol. 6, No. 1 (2000), pp. 139 148 A CHARACTERIZATION OF STRICT LOCAL MINIMIZERS OF ORDER ONE FOR STATIC MINMAX PROBLEMS IN THE PARAMETRIC CONSTRAINT CASE A. W. A. TAHA Received
More informationConvergence Concepts of Random Variables and Functions
Convergence Concepts of Random Variables and Functions c 2002 2007, Professor Seppo Pynnonen, Department of Mathematics and Statistics, University of Vaasa Version: January 5, 2007 Convergence Modes Convergence
More informationMeasure-theoretic probability
Measure-theoretic probability Koltay L. VEGTMAM144B November 28, 2012 (VEGTMAM144B) Measure-theoretic probability November 28, 2012 1 / 27 The probability space De nition The (Ω, A, P) measure space is
More informationECON 4117/5111 Mathematical Economics
Test 1 September 29, 2006 1. Use a truth table to show the following equivalence statement: (p q) (p q) 2. Consider the statement: A function f : X Y is continuous on X if for every open set V Y, the pre-image
More informationParameter Estimation for ARCH(1) Models Based on Kalman Filter
Applied Mathematical Sciences, Vol. 8, 2014, no. 56, 2783-2791 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2014.43164 Parameter Estimation for ARCH(1) Models Based on Kalman Filter Jelloul
More informationADVANCED PROBABILITY: SOLUTIONS TO SHEET 1
ADVANCED PROBABILITY: SOLUTIONS TO SHEET 1 Last compiled: November 6, 213 1. Conditional expectation Exercise 1.1. To start with, note that P(X Y = P( c R : X > c, Y c or X c, Y > c = P( c Q : X > c, Y
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationA Descent Algorithm for Solving Variational Inequality Problems
Applied Mathematical Sciences, Vol. 2, 2008, no. 42, 2095-2103 A Descent Algorithm for Solving Variational Inequality Problems Zahira Kebaili and A. Krim Keraghel Department of Mathematics, Sétif University
More informationSupport Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM
1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University
More informationELEMENTS OF PROBABILITY THEORY
ELEMENTS OF PROBABILITY THEORY Elements of Probability Theory A collection of subsets of a set Ω is called a σ algebra if it contains Ω and is closed under the operations of taking complements and countable
More informationSupport Vector Machines and Kernel Methods
2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University
More informationMachine Learning 1: Linear Regression
15-830 Machine Learning 1: Linear Regression J. Zico Kolter September 18, 2012 1 Motivation How much energy will we consume tommorrow? Difficult to estimate from a priori models But, we have lots of data
More informationCOMPARATIVE STUDY BETWEEN LEMKE S METHOD AND THE INTERIOR POINT METHOD FOR THE MONOTONE LINEAR COMPLEMENTARY PROBLEM
STUDIA UNIV. BABEŞ BOLYAI, MATHEMATICA, Volume LIII, Number 3, September 2008 COMPARATIVE STUDY BETWEEN LEMKE S METHOD AND THE INTERIOR POINT METHOD FOR THE MONOTONE LINEAR COMPLEMENTARY PROBLEM ADNAN
More informationLinear Discrimination Functions
Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach
More informationCHAPTER 3: LARGE SAMPLE THEORY
CHAPTER 3 LARGE SAMPLE THEORY 1 CHAPTER 3: LARGE SAMPLE THEORY CHAPTER 3 LARGE SAMPLE THEORY 2 Introduction CHAPTER 3 LARGE SAMPLE THEORY 3 Why large sample theory studying small sample property is usually
More informationBeyond stochastic gradient descent for large-scale machine learning
Beyond stochastic gradient descent for large-scale machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines - October 2014 Big data revolution? A new
More informationAlmost sure limit theorems for U-statistics
Almost sure limit theorems for U-statistics Hajo Holzmann, Susanne Koch and Alesey Min 3 Institut für Mathematische Stochasti Georg-August-Universität Göttingen Maschmühlenweg 8 0 37073 Göttingen Germany
More informationAn almost sure invariance principle for additive functionals of Markov chains
Statistics and Probability Letters 78 2008 854 860 www.elsevier.com/locate/stapro An almost sure invariance principle for additive functionals of Markov chains F. Rassoul-Agha a, T. Seppäläinen b, a Department
More informationDifferential Resultants and Subresultants
1 Differential Resultants and Subresultants Marc Chardin Équipe de Calcul Formel SDI du CNRS n 6176 Centre de Mathématiques et LIX École Polytechnique F-91128 Palaiseau (France) e-mail : chardin@polytechniquefr
More informationEmpirical Processes: General Weak Convergence Theory
Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated
More informationECE 595: Machine Learning I Adversarial Attack 1
ECE 595: Machine Learning I Adversarial Attack 1 Spring 2019 Stanley Chan School of Electrical and Computer Engineering Purdue University 1 / 32 Outline Examples of Adversarial Attack Basic Terminology
More informationNonlinear elliptic systems with exponential nonlinearities
22-Fez conference on Partial Differential Equations, Electronic Journal of Differential Equations, Conference 9, 22, pp 139 147. http://ejde.math.swt.edu or http://ejde.math.unt.edu ftp ejde.math.swt.edu
More informationEXISTENCE RESULTS FOR OPERATOR EQUATIONS INVOLVING DUALITY MAPPINGS VIA THE MOUNTAIN PASS THEOREM
EXISTENCE RESULTS FOR OPERATOR EQUATIONS INVOLVING DUALITY MAPPINGS VIA THE MOUNTAIN PASS THEOREM JENICĂ CRÎNGANU We derive existence results for operator equations having the form J ϕu = N f u, by using
More informationConvergence of Simultaneous Perturbation Stochastic Approximation for Nondifferentiable Optimization
1 Convergence of Simultaneous Perturbation Stochastic Approximation for Nondifferentiable Optimization Ying He yhe@engr.colostate.edu Electrical and Computer Engineering Colorado State University Fort
More informationOn the number of ways of writing t as a product of factorials
On the number of ways of writing t as a product of factorials Daniel M. Kane December 3, 005 Abstract Let N 0 denote the set of non-negative integers. In this paper we prove that lim sup n, m N 0 : n!m!
More informationDeep Learning for Computer Vision
Deep Learning for Computer Vision Lecture 4: Curse of Dimensionality, High Dimensional Feature Spaces, Linear Classifiers, Linear Regression, Python, and Jupyter Notebooks Peter Belhumeur Computer Science
More informationOn the L -regularity of solutions of nonlinear elliptic equations in Orlicz spaces
2002-Fez conference on Partial Differential Equations, Electronic Journal of Differential Equations, Conference 09, 2002, pp 8. http://ejde.math.swt.edu or http://ejde.math.unt.edu ftp ejde.math.swt.edu
More informationOn a simple construction of bivariate probability functions with fixed marginals 1
On a simple construction of bivariate probability functions with fixed marginals 1 Djilali AIT AOUDIA a, Éric MARCHANDb,2 a Université du Québec à Montréal, Département de mathématiques, 201, Ave Président-Kennedy
More informationIf Y and Y 0 satisfy (1-2), then Y = Y 0 a.s.
20 6. CONDITIONAL EXPECTATION Having discussed at length the limit theory for sums of independent random variables we will now move on to deal with dependent random variables. An important tool in this
More informationn! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2
Order statistics Ex. 4. (*. Let independent variables X,..., X n have U(0, distribution. Show that for every x (0,, we have P ( X ( < x and P ( X (n > x as n. Ex. 4.2 (**. By using induction or otherwise,
More informationConvex Optimization Lecture 16
Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean
More informationProbability and Measure
Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability
More informationRachid Benouahboun 1 and Abdelatif Mansouri 1
RAIRO Operations Research RAIRO Oper. Res. 39 25 3 33 DOI:.5/ro:252 AN INTERIOR POINT ALGORITHM FOR CONVEX QUADRATIC PROGRAMMING WITH STRICT EQUILIBRIUM CONSTRAINTS Rachid Benouahboun and Abdelatif Mansouri
More informationA Gel fond type criterion in degree two
ACTA ARITHMETICA 111.1 2004 A Gel fond type criterion in degree two by Benoit Arbour Montréal and Damien Roy Ottawa 1. Introduction. Let ξ be any real number and let n be a positive integer. Defining the
More informationIntroduction Dual Representations Kernel Design RBF Linear Reg. GP Regression GP Classification Summary. Kernel Methods. Henrik I Christensen
Kernel Methods Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Kernel Methods 1 / 37 Outline
More informationChapter 3: Unbiased Estimation Lecture 22: UMVUE and the method of using a sufficient and complete statistic
Chapter 3: Unbiased Estimation Lecture 22: UMVUE and the method of using a sufficient and complete statistic Unbiased estimation Unbiased or asymptotically unbiased estimation plays an important role in
More informationA collocation method for solving some integral equations in distributions
A collocation method for solving some integral equations in distributions Sapto W. Indratno Department of Mathematics Kansas State University, Manhattan, KS 66506-2602, USA sapto@math.ksu.edu A G Ramm
More informationSome Approximations of the Logistic Distribution with Application to the Covariance Matrix of Logistic Regression
Working Paper 2013:9 Department of Statistics Some Approximations of the Logistic Distribution with Application to the Covariance Matrix of Logistic Regression Ronnie Pingel Working Paper 2013:9 June
More informationGaussian vectors and central limit theorem
Gaussian vectors and central limit theorem Samy Tindel Purdue University Probability Theory 2 - MA 539 Samy T. Gaussian vectors & CLT Probability Theory 1 / 86 Outline 1 Real Gaussian random variables
More informationOn the Properties of a Tree-Structured Server Process
On the Properties of a Tree-Structured Server Process J. Komlos Rutgers University A. Odlyzko L. Ozarow L. A. Shepp AT&T Bell Laboratories Murray Hill, New Jersey 07974 I. INTRODUCTION In this paper we
More informationECE 595: Machine Learning I Adversarial Attack 1
ECE 595: Machine Learning I Adversarial Attack 1 Spring 2019 Stanley Chan School of Electrical and Computer Engineering Purdue University 1 / 32 Outline Examples of Adversarial Attack Basic Terminology
More informationLeft Multipliers Satisfying Certain Algebraic Identities on Lie Ideals of Rings With Involution
Int. J. Open Problems Comput. Math., Vol. 5, No. 3, September, 2012 ISSN 2074-2827; Copyright c ICSRS Publication, 2012 www.i-csrs.org Left Multipliers Satisfying Certain Algebraic Identities on Lie Ideals
More informationQualifying Exam in Probability and Statistics. https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf
Part : Sample Problems for the Elementary Section of Qualifying Exam in Probability and Statistics https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf Part 2: Sample Problems for the Advanced Section
More informationProblem Sheet 1. You may assume that both F and F are σ-fields. (a) Show that F F is not a σ-field. (b) Let X : Ω R be defined by 1 if n = 1
Problem Sheet 1 1. Let Ω = {1, 2, 3}. Let F = {, {1}, {2, 3}, {1, 2, 3}}, F = {, {2}, {1, 3}, {1, 2, 3}}. You may assume that both F and F are σ-fields. (a) Show that F F is not a σ-field. (b) Let X :
More informationThe moment-lp and moment-sos approaches
The moment-lp and moment-sos approaches LAAS-CNRS and Institute of Mathematics, Toulouse, France CIRM, November 2013 Semidefinite Programming Why polynomial optimization? LP- and SDP- CERTIFICATES of POSITIVITY
More informationRafał Kapica, Janusz Morawiec CONTINUOUS SOLUTIONS OF ITERATIVE EQUATIONS OF INFINITE ORDER
Opuscula Mathematica Vol. 29 No. 2 2009 Rafał Kapica, Janusz Morawiec CONTINUOUS SOLUTIONS OF ITERATIVE EQUATIONS OF INFINITE ORDER Abstract. Given a probability space (, A, P ) and a complete separable
More information3 Integration and Expectation
3 Integration and Expectation 3.1 Construction of the Lebesgue Integral Let (, F, µ) be a measure space (not necessarily a probability space). Our objective will be to define the Lebesgue integral R fdµ
More informationLERAY LIONS DEGENERATED PROBLEM WITH GENERAL GROWTH CONDITION
2005-Oujda International Conference on Nonlinear Analysis. Electronic Journal of Differential Equations, Conference 14, 2006, pp. 73 81. ISSN: 1072-6691. URL: http://ejde.math.txstate.edu or http://ejde.math.unt.edu
More informationDirection: This test is worth 250 points and each problem worth points. DO ANY SIX
Term Test 3 December 5, 2003 Name Math 52 Student Number Direction: This test is worth 250 points and each problem worth 4 points DO ANY SIX PROBLEMS You are required to complete this test within 50 minutes
More informationyou expect to encounter difficulties when trying to solve A x = b? 4. A composite quadrature rule has error associated with it in the following form
Qualifying exam for numerical analysis (Spring 2017) Show your work for full credit. If you are unable to solve some part, attempt the subsequent parts. 1. Consider the following finite difference: f (0)
More information{σ x >t}p x. (σ x >t)=e at.
3.11. EXERCISES 121 3.11 Exercises Exercise 3.1 Consider the Ornstein Uhlenbeck process in example 3.1.7(B). Show that the defined process is a Markov process which converges in distribution to an N(0,σ
More informationObserver Design and Admissible Disturbances: A Discrete Disturbed System
Applied Mathematical Sciences, Vol. 1, 2007, no. 33, 1631-1650 Observer Design and Admissible Disturbances: A Discrete Disturbed System M. Rachik 1, S. Saadi 2, Y. Rahhou, O. El Kahlaoui Université Hassan
More informationFinding a Strict Feasible Dual Initial Solution for Quadratic Semidefinite Programming
Applied Mathematical Sciences, Vol 6, 22, no 4, 229-234 Finding a Strict Feasible Dual Initial Solution for Quadratic Semidefinite Programming El Amir Djeffal Departement of Mathematics University of Batna,
More informationISSN X Quadratic loss estimation of a location parameter when a subset of its components is unknown
Afria Statistia Vol. 8, 013, pages 561 575. DOI: http://dx.doi.org/10.4314/afst.v8i1.5 Afria Statistia ISSN 316-090X Quadratic loss estimation of a location parameter when a subset of its components is
More informationJournal of Inequalities in Pure and Applied Mathematics
Journal of Inequalities in Pure and Applied athematics http://jipam.vu.edu.au/ Volume 4, Issue 5, Article 98, 2003 ASYPTOTIC BEHAVIOUR OF SOE EQUATIONS IN ORLICZ SPACES D. ESKINE AND A. ELAHI DÉPARTEENT
More informationMulti-operator scaling random fields
Available online at www.sciencedirect.com Stochastic Processes and their Applications 121 (2011) 2642 2677 www.elsevier.com/locate/spa Multi-operator scaling random fields Hermine Biermé a, Céline Lacaux
More informationDivision of the Humanities and Social Sciences. Sums of sets, etc.
Division of the Humanities and Social Sciences Sums of sets, etc. KC Border September 2002 Rev. November 2012 Rev. September 2013 If E and F are subsets of R m, define the sum E + F = {x + y : x E; y F
More informationEconomics 620, Lecture 2: Regression Mechanics (Simple Regression)
1 Economics 620, Lecture 2: Regression Mechanics (Simple Regression) Observed variables: y i ; x i i = 1; :::; n Hypothesized (model): Ey i = + x i or y i = + x i + (y i Ey i ) ; renaming we get: y i =
More informationON THE PATHWISE UNIQUENESS OF SOLUTIONS OF STOCHASTIC DIFFERENTIAL EQUATIONS
PORTUGALIAE MATHEMATICA Vol. 55 Fasc. 4 1998 ON THE PATHWISE UNIQUENESS OF SOLUTIONS OF STOCHASTIC DIFFERENTIAL EQUATIONS C. Sonoc Abstract: A sufficient condition for uniqueness of solutions of ordinary
More informationSTABILITY OF n-th ORDER FLETT S POINTS AND LAGRANGE S POINTS
SARAJEVO JOURNAL OF MATHEMATICS Vol.2 (4) (2006), 4 48 STABILITY OF n-th ORDER FLETT S POINTS AND LAGRANGE S POINTS IWONA PAWLIKOWSKA Abstract. In this article we show the stability of Flett s points and
More informationOptimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison
Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big
More informationEconomics 620, Lecture 8: Asymptotics I
Economics 620, Lecture 8: Asymptotics I Nicholas M. Kiefer Cornell University Professor N. M. Kiefer (Cornell University) Lecture 8: Asymptotics I 1 / 17 We are interested in the properties of estimators
More informationBeyond stochastic gradient descent for large-scale machine learning
Beyond stochastic gradient descent for large-scale machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt - CAP, July
More informationThe ϵ-capacity of a gain matrix and tolerable disturbances: Discrete-time perturbed linear systems
IOSR Journal of Mathematics (IOSR-JM) e-issn: 2278-5728, p-issn: 2319-765X. Volume 11, Issue 3 Ver. IV (May - Jun. 2015), PP 52-62 www.iosrjournals.org The ϵ-capacity of a gain matrix and tolerable disturbances:
More informationMATH 104: INTRODUCTORY ANALYSIS SPRING 2008/09 PROBLEM SET 8 SOLUTIONS
MATH 04: INTRODUCTORY ANALYSIS SPRING 008/09 PROBLEM SET 8 SOLUTIONS. Let f : R R be continuous periodic with period, i.e. f(x + ) = f(x) for all x R. Prove the following: (a) f is bounded above below
More information1 Basics of probability theory
Examples of Stochastic Optimization Problems In this chapter, we will give examples of three types of stochastic optimization problems, that is, optimal stopping, total expected (discounted) cost problem,
More informationFUNCTIONAL SEQUENTIAL AND TRIGONOMETRIC SUMMABILITY OF REAL AND COMPLEX FUNCTIONS
International Journal of Analysis Applications ISSN 91-8639 Volume 15, Number (17), -8 DOI: 1.894/91-8639-15-17- FUNCTIONAL SEQUENTIAL AND TRIGONOMETRIC SUMMABILITY OF REAL AND COMPLEX FUNCTIONS M.H. HOOSHMAND
More informationChp 4. Expectation and Variance
Chp 4. Expectation and Variance 1 Expectation In this chapter, we will introduce two objectives to directly reflect the properties of a random variable or vector, which are the Expectation and Variance.
More informationPattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods
Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter
More informationLinear Models Review
Linear Models Review Vectors in IR n will be written as ordered n-tuples which are understood to be column vectors, or n 1 matrices. A vector variable will be indicted with bold face, and the prime sign
More informationCHANGE DETECTION WITH UNKNOWN POST-CHANGE PARAMETER USING KIEFER-WOLFOWITZ METHOD
CHANGE DETECTION WITH UNKNOWN POST-CHANGE PARAMETER USING KIEFER-WOLFOWITZ METHOD Vijay Singamasetty, Navneeth Nair, Srikrishna Bhashyam and Arun Pachai Kannu Department of Electrical Engineering Indian
More information13. Nonlinear least squares
L. Vandenberghe ECE133A (Fall 2018) 13. Nonlinear least squares definition and examples derivatives and optimality condition Gauss Newton method Levenberg Marquardt method 13.1 Nonlinear least squares
More informationProduct measures, Tonelli s and Fubini s theorems For use in MAT4410, autumn 2017 Nadia S. Larsen. 17 November 2017.
Product measures, Tonelli s and Fubini s theorems For use in MAT4410, autumn 017 Nadia S. Larsen 17 November 017. 1. Construction of the product measure The purpose of these notes is to prove the main
More informationHamad Talibi Alaoui and Radouane Yafia. 1. Generalities and the Reduced System
FACTA UNIVERSITATIS (NIŠ) Ser. Math. Inform. Vol. 22, No. 1 (27), pp. 21 32 SUPERCRITICAL HOPF BIFURCATION IN DELAY DIFFERENTIAL EQUATIONS AN ELEMENTARY PROOF OF EXCHANGE OF STABILITY Hamad Talibi Alaoui
More informationExercise List: Proving convergence of the Gradient Descent Method on the Ridge Regression Problem.
Exercise List: Proving convergence of the Gradient Descent Method on the Ridge Regression Problem. Robert M. Gower September 5, 08 Introduction Ridge regression is perhaps the simplest example of a training
More informationLimiting distribution for subcritical controlled branching processes with random control function
Available online at www.sciencedirect.com Statistics & Probability Letters 67 (2004 277 284 Limiting distribution for subcritical controlled branching processes with random control function M. Gonzalez,
More information