w 1 output input &N 1 x w n w N =&2
|
|
- Susanna Bryant
- 6 years ago
- Views:
Transcription
1 ISSN Technical Report L Noise Suppression in Training Data for Improving Generalization Akiko Nakashima, Akira Hirabayashi, and Hidemitsu OGAWA TR96-9 November Department of Computer Science Tokyo Institute of Technology ^Ookayama 2-2- Meguro Tokyo 52, Japan cthe author(s) of this report reserves all the rights. To appear in IEEE International Joint Conference on Neural Networks '98.
2 Noise Suppression in Training Data for Improving Generalization Akiko Nakashima, Akira Hirabayashi and Hidemitsu Ogawa Dept. of Computer Science, Tokyo Institute of Technology, 2-2-, O-okayama, Meguro-ku, Tokyo-52, Japan. Abstract Multi-layer feedforward neural networks are trained using the error back-propagation(bp) algorithm. This algorithm minimizes the error between outputs of a neural network(nn) and training data. Hence, in the case of noisy training data, a trained network memorizes noisy outputs for given inputs. Such learning is called rote memorization learning(rml). In this paper we propose error correcting memorization learning(cml). It can suppress noise in training data. In order to evaluate generalization ability of CML, it is compared with the projection learning (PL) criterion. It is theoretically proved that although CML merely suppresses noise in training data, it provides the same generalization as PL under some necessary and sucient condition. Introduction The learning problem of feed-forward neural networks is considered from the functional analytic point of view with noisy training data. What is important in the learning problem is to achieve a high level of generalization, that is, to construct a neural network which outputs true values not only for training inputs, but also for novel inputs. The back-propagation algorithm is often used for training a neural network. It is derived from a criterion of socalled rote memorization learning(rml), which minimizes the error between outputs of a neural network and noisy training data. Hence, RML does not guarantee generalization ability. In order to solve the problem, a regularization method was proposed[5],[2]. However, it still uses the criterion of RML together with a term of smoothness. In this paper, we propose error correcting memorization learning(cml) to suppress noise in training data. Generalization ability of CML is evaluated by comparing it with projection learning(pl) which reduces errors in the original function space. We obtain a necessary and sucient condition under which CML not only suppresses noise in the training data but also improve generalization. It is known that RML also provides the same generalization as PL under some condition. Although an analytical solution was provided for this condition in [8], here, we use the results on CML to interpret and clarify the above solution. 2 Neural network learning as an inverse problem In this section, we shall present a brief review of the basic formalization necessary for discussing the learning problem in NNs from the functional analytic point of view. Let us begin by considering a three-layer feedforward neural network whose number of input, hidden, and output units are L, N, and, respectively as shown in Fig.. Let x be the L dimensional vector consisting of L inputs i(i = ; ; L), which is referred to as the input vector. The network can be considered as a real valued function f(x) of L variables. input &N x &N L u N u n u w w n w N &2 output f ( x) = A ( J) y =&2 N w n n = Figure: NN as a real valued function u n ( x) The learning problem is to construct a neural network by using a set of training data so that the NN expresses the best approximation f(x) to a desired function f(x) under some learning criterion. We dene some of the notations used here. A training set given as a set of M input vec- fxmg M m= : tors.
3 fy m g M m= : The corresponding noisy output values, where y m = f(x m ) + n m. fx m ; y m g M m= : A set of training data. Once a training set fx m g M m= is xed, the corresponding true outputs ff(x m )g M m= are uniquely determined from f. Hence, we can introduce an operator A which maps f to the vector consisting of ff(x m )g M m=. Let y and n be the M-dimensional vectors consisting of elements fy m g M m= and fn mg M m=, respectively. Then we have y = Af + n: () The operator A is called the sampling operator. It becomes a linear operator even when we are concerned with nonlinear NNs. Let H be the set of all functions f to be approximated by the neural networks. Assume that H is a Hilbert space with a reproducing kernel K(x; x ). Let D be the domain of functions f, which is a subset of the L-dimensional Euclidean space R L. The reproducing kernel K(x; x ) is a bivariate function dened on D 2 D which satises the following two conditions:. For any xed x in D, K(x; x ) is a function in H. 2. For any f in H and x in D, it holds that (f(x); K(x; x )) = f(x ); (2) where the left hand side of eq.(2) denotes the inner product in H. In the theory of Hilbert space, arguments are developed by regarding a function as a point in that space. Thus, things such as 'value of a function at a point' cannot be discussed under the general framework of Hilbert space. However, if the Hilbert space has a reproducing kernel, then it is possible to deal with the value of a function at a point as shown in eq.(2). The sampling operator A is expressed by the reproducing kernel as A = X M m= e m K(x; x m ); (3) where fe m g M m= is the so-called natural basis in R M, i.e., e m is the M-dimensional vector consisting of zero elements except the m-th element equal to. The notation (: :) is the Schatten product dened by (e m g)f = (f; g)e m : (4) Now the learning problem is the problem of obtaining an estimate, say f, to f from y in the model. This can be considered as an inverse problem [4] equivalent to obtaining an operator X which provides f from y: f = Xy: (5) The operator X is called the learning operator. It can be optimized based on dierent learning criteria[4]. We denotes a criterion by J in general, and the operator X satisfying J by A (J). 3 Rote memorization learning The BP method minimizes the training error, that is, MX (f(x m ) y m ) 2 : (6) m= Hence, the learning criterion for the BP method is as follows. Denition (Rote memorization learning) If an operator X minimizes the functional J RM [X] = kaxy yk 2 ; (7) X is called the rote memorization learning(rml) operator and denoted by A (RM), where k k is a norm in R M. A general form of the RML operator is given as A (RM) = A y + Y A y AY; (8) where A y is the Moore-Penrose generalized inverse of A[] and Y is an arbitrary operator from R M to H. J RM requires only to memorize the given noisy training data by rote. 4 Error correcting memorization learning When we expect a NN to output the correct values for given inputs, the mean squared error between outputs of a NN and correct values of noisy training data should be minimized. The error is expressed by MX E[ (f(x m ) f(x m )) 2 ] = E[kAf Af k 2 ]; (9) n m= n where En denotes the expectation over the noise ensemble fng. Using eqs.() and (5), we are able to decompose the rst term in the right hand side of eq.(9) as Af = AXAf + AXn: () The rst and second terms in the right hand side of eq.() denote the signal component and the noise 2
4 H R M N(A) f R(U) A (CM) R(A) QR(A) f A y n R(A*) + + V A*U y R(A) Af Af Figure 2: Learned functions & mechanism of noise suppression. component of Af, respectively. The former is deterministic, whereas the latter is probabilistic in nature. Therefore, we require that the signal component of Af agrees with the true values Af of the training data. It leads us to the concept of error correcting memorization learning as follows. Denition 2 (Error Correcting Memorization Learning) For any f given by eqs.(5) and (), if an operator X minimizes the functional under the constraint J CM [X] = E n [kaf Af k 2 ] () AXA = A; (2) X is called the error correcting memorization learning(cml) operator and denoted by A (CM). Eq.(3) is determined by operators A,Q, and Y. A is obtained from a training set as shown in eq.(3). Q is the correlation matrix, which is determined by the nature of noise. Note that noise is not limited to, for example, a normal distribution or a mean-zero distribution. Hence, we can apply the theorem to any type of noise as long as Q is estimated. Statistically, almost all noisy training data y lie in the range of U denoted by R(U)[6]. It has the following (generally nonorthogonal) direct sum decomposition. R(U) = R(A) _+QR(A)? ; (6) where R(A)? is the orthogonal complement of R(A). This decomposition yields the following result. Theorem 2 An operator X satises the CML criterion if and only if Theorem A general form of the CML operator is given as y : y 2 R(A) AXy = : y 2 QR(A)? : (7) A (CM) = V y A 3 U y + Y A y AY U U y ; (3) where A 3 is the adjoint operator of A. U and V is dened as U = AA 3 + Q; V = A 3 U y A; (4) where Q is the correlation matrix of noise and Y is an arbitrary operator from R M to H. The minimum value of J CM [X] is given by min X J CM [X] = J CM [A (CM) ] = tr(av y A 3 ) tr(aa 3 ): (5) The correlation matrix of noise is the M 2M matrix dened by Q En(n n) whose ij-th component is En(ninj). Theorem 2 shows a mechanism of noise suppression by CML, which is illustrated in Fig.2. Let us consider a vector y in R(U). It is decomposed using eq.(6) as y = Af + n + n 2 ; (8) where n and n 2 are the R(A)-component and the QR(A)? -component of n, respectively. For the vector y, Af = AA (CM) y = Af + n ; (9) which is the R(A)-component of y. CML removes the QR(A)? -component n 2 of n. In this sense QR(A)? 3
5 5 5 4 *: training data solid: original function dashed: function learned by CML dotted: function learned by RML 4 *: training data solid: original function dashed: function learned by CML dotted: function learned by RML x Figure 3: Functions learned by CML and RML with the training set fx m g 3 m= = f:8; :4; g x Figure 4: Functions learned by CML and RML with the training set fx m g 3 m= = f; ; g. is the subspace which results in optimal noise suppression. Next, we show results of CML on an articial problem. Let H be a 2-dimensional functional space spanned by f' n (x)g 2 n= = fsinx; cosxg (2) and the inner product in H be dened as (f; g) = Z f(x)g(x)dx : f; g 2 H: (2) Then f'; '2g in eq.(2) becomes an orthonormal basis of H. The reproducing kernel of H is given by 2X K(x; x ) = ' n (x)' n (x ) (22) n= = cos(x x ): (23) Let us now consider the problem of learning a function f(x) = :5sinx + 3cosx (24) within this function space H. We consider two dierent experiments. The rst (Fig.3) uses 3 training points, fx m g 3 m= = f:8; :4; g, while the second (Fig.4) uses a dierent set of 3 training points, fx m g 3 m= = f; ; g. Here, we assume that noise is generated from the three-dimensional normal distribution with diagonal covariance matrix diag(:3 2 ; :2 2 ; : 2 ) and mean En n = (:8; :; :5) t. Then, the noise correlation matrix is given by Q :73 :8 :2 :8 :4 :5 :2 :5 2:26 A : (25) In Fig.3 and Fig.4, the true sampled values ff(x m )g 3 m= and the noisy training data y are denoted by 'o' and '3', respectively. The original function f is denoted by a solid line while the learned functions f by CML and RML are shown by a dashed line and a dotted line, respectively. As for the training error, both experiments give the similar results. The learned function f by CML passes near the true points denoted by 'o', whereas f by RML passes near the noisy data points denoted by '3'. This example shows that CML certainly suppresses noise in the training data. As for the generalization, experimental results are somewhat dierent. CML in Fig.3 provides the better approximation than CML in Fig.4. This result shows that the generalization by CML depends on the training set fx m g 3 m=. 5 Admissibility CML evaluates the error only over the training set. Therefore, even when a network is successfully trained for given inputs by CML, it is not guaranteed that f provides desired output values for novel inputs as shown in Fig.4. When we expect CML to achieve higher generalization, we implicitly uses the CML criterion as a substitute for some true criterion J which directly estimates the generalization error. In order to discuss conditions when we can substitute J CM for J, the concept of admissibility is useful[7]. Consider a general case that a criterion J substitutes for J. Generally, there are many learning operators which satisfy J. A set of them is denoted by AfJg. 4
6 Denition 3 (Admissibility)[7] (i) (Non admissibility) If all J -learnings do not satisfy J, i.e.,if it holds that AfJg \ AfJ g = ; (26) then it is said that J does not admit J. (ii) (Partial admissibility) If there is at least one J - learning which satises J, i.e., if it holds that AfJg \ AfJ g 6= ; (27) then it is said that J partially admits J. (iii) (Admissibility) If all J -learning satisfy J, i.e., if it holds that AfJg AfJ g; (28) then it is said that J always admits J, in brief J admits J. (iv) (Complete admissibility) If J always admits J and vice versa, i.e., if it holds that AfJg = AfJ g; (29) then it is said that J completely admits J. (v) (Inverse admissibility) If all J-learning satisfy J, i.e., if it holds that AfJg AfJ g; (3) then it is said that J is always admitted by J. Eq.(28) means that J is sucient for J, while eq.(3) means that J is necessary for J. Based on the concept of admissibility we shall discuss generalization ability of CML in the next section. 6 Generalization ability of!!!! CML In this section, as an example of true criterion J, we shall consider projection learning(pl)[3]. Let P be the orthogonal projection operator onto R(A 3 ). Denition 4 (Projection learning)[3] For any f given by eqs.(5) and (), if an operator X minimizes the functional J P [X] = E n [kf P f k 2 ] (3) under the constraint XA = P; (32) X is called the projection learning (PL) operator and denoted by A (P ), where k k is a norm in H. Whenever we use a linear operator X for constructing f, the range of X becomes a subspace of H. Hence 'the best approximation' implies that f is the nearest point to f in the subspace R(X), i.e., the orthogonal projection of f onto R(X). R(A 3 ) is the largest subspace in which we can obtain the orthogonal projection of f from y without knowing the original f. That is the reason why in eq.(3) error is evaluated not between f and f but between f and P f. Eqs.(5) and () yield that f = XAf + Xn: (33) The rst term XAf in the right hand side of eq.(33) is the signal component of f. It is independent of n in y. Hence, it is required that the signal component of f agrees with the best approximation of f in R(A 3 ), which is represented by the soft condition in the constrained optimization of eq.(32). Let us consider the case that the CML criterion J CM is used as a substitute for the PL criterion J P. In this case, the following two kinds of admissibilities appear among ve, listed in Section 5. Theorem 3 (Inverse admissibility) All the PL operators satisfy the CML criterion i.e., it always holds that AfJ P g AfJ CM g: (34) Theorem 4 (Complete admissibility) The PL criterion completely admits the CML criterion, i.e., it holds that AfJ P g = AfJ CM g (35) if and only if or N (A) = fg (36) N (A) = H!$R(Q) = fg; (37) where N (A) represents the null space of A. Theorem 3 says that any projection learning operator A (P ) can always suppress noise in training data. Theorem 3 and Theorem 4 show that there are A (CM) s which do not satisfy J P in general. Hence, noise suppression in the training data is not enough for CML to obtain the same generalization as that of PL; additionally eq.(36) or eq.(37) has to be satised. N (A) is the subspace consisting of functions which are mapped to zero vector by the sampling operator A. Eq.(37) refers to the condition when all the training data is statistically always zero, which does not make sense in practical learning problems. Therefore, eq.(36) is more essential for complete admissibility. When eq.(36) does not hold, the situation is as follows. In Fig.2, N (A), which has many non-zero functions, is denoted by a line perpendicular to R(A 3 ). 5
7 From eq.(3), the function learned by CML from y in R(U) is given as f = V y A 3 U y y + f; (38) where f is an arbitrary function in N (A). The rst term in the right hand side of eq.(38) is the function obtained by PL from the same y. Hence, the generalization error of the function in eq.(38) depends on the selection of f. When eq.(36) holds, N (A) only consists of zero function. Then, the function in eq.(38) beomes equal to the function obtained by PL. Whether eq.(36) holds or not depends on a training set fxmg M m=, because the sampling operator A is determined by the training set as shown in eq.(3). Hence, the generalization by CML depends on the selection of the training set. The training set used in Fig.3 satises eq.(36), whereas the training set used in Fig.4 does not satisfy eq.(36). This dierence causes the variation in generalization in Fig.3 and Fig.4. 7 Generalization ability of!!!! RML Although RML does not consider suppression of noise, it provides the same generalization as PL under some conditions. In this section we shall interpret the condition by using the results on CML. Since statistically almost all y lie in R(U), let us consider RML with y limited to R(U). Complete admissibility holds for CML if eq(36) is satised; while an additional condition QR(A)? R(A)? (39) is necessary for RML. The left hand side of eq.(39) is the subspace which results in optimal noise suppression in the sense of CML as shown by Theorem 2. RML also suppresses noise to some extent, although it seems contradictory to the RML criterion. RML constructs f so that Af becomes the best approximation to noisy y. Af belongs to R(A), even if y does not belong to R(A) in general because of noise. Hence, the best approximation to y is the orthogonal projection of y onto R(A). As a result a component of noise in R(A)? is removed independently of the nature of noise. When R(A)? includes QR(A)? as shown in eq.(39), for any y in R(U) a component in R(A)? becomes equal to the component in QR(A)?. Then, RML can suppress as much noise in the training data as CML. 8 Conclusions We proposed error correcting memorization learning. It can suppress noise in training data using the noise correlation matrix. By comparing the generalization ability of CML with that of PL, we obtained a necessary and sucient condition under which CML provides better generalization. In the case of RML, a further condition is necessary. We interpreted the meaning of the additional condition using the results on CML. Acknowledgements We would like to thank Mr. S.Vijayakumar for fruitful discussions. This work was supported by the Grant-in- Aid for Scientic Research ] and ]4429. References [] A.Albert,Regression and the Moore-Penrose Pseudoinverse, Academic Press(972). [2] C.M.Bishop,\Improving the generalization properties of radial basis function networks", Neural Computation, vol.3,no.4, pp.579{588(winter 99). [3] H.Ogawa,\Projection lter regularization of illconditioned problem", SPIE, Inverse Problems in Optics, vol.88, pp.89{96 (987). [4] H.Ogawa,\Neural network learning, generalization and over-learning", Proc. ICIIPS'92 (Beijing),Oct.3 - Nov.,992, vol.2, pp.-6(992). [5] T.Poggio and F.Girosi,\Networks for approximation and learning", Proc. of the IEEE, vol.78, no.9, pp (Sep. 99). [6] Y.Yamashita and H.Ogawa,\Properties of averaged projection lter for image restoration",trans. IEICE,Japan, vol.j74-d-ii, no.2, pp (Feb. 99)(In Japanese). [7] H.Ogawa and Y.Yamasaki,\A theory of overlearning", Trans. IEICE, Japan, vol.j76-d-ii, no.7, pp (June 993)(In Japanese); Its English and short version appeared in Articial Neural Networks 2, vol., I.Aleksander and J.Taylor, Eds., North-Holland, pp.25{28(992). [8] A.Hirabayashi and H.Ogawa,\Admissibility of memorization learning with respect to projection learning in the presence of noise", ICNN'96 (Washington D.C.), Jun.3-6, 996, vol., pp (jun. 996). 6
Approximating the Best Linear Unbiased Estimator of Non-Gaussian Signals with Gaussian Noise
IEICE Transactions on Information and Systems, vol.e91-d, no.5, pp.1577-1580, 2008. 1 Approximating the Best Linear Unbiased Estimator of Non-Gaussian Signals with Gaussian Noise Masashi Sugiyama (sugi@cs.titech.ac.jp)
More informationLinear Regression and Its Applications
Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start
More informationlinearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice
3. Eigenvalues and Eigenvectors, Spectral Representation 3.. Eigenvalues and Eigenvectors A vector ' is eigenvector of a matrix K, if K' is parallel to ' and ' 6, i.e., K' k' k is the eigenvalue. If is
More informationbelow, kernel PCA Eigenvectors, and linear combinations thereof. For the cases where the pre-image does exist, we can provide a means of constructing
Kernel PCA Pattern Reconstruction via Approximate Pre-Images Bernhard Scholkopf, Sebastian Mika, Alex Smola, Gunnar Ratsch, & Klaus-Robert Muller GMD FIRST, Rudower Chaussee 5, 12489 Berlin, Germany fbs,
More informationVector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.
Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar
More informationMAT Linear Algebra Collection of sample exams
MAT 342 - Linear Algebra Collection of sample exams A-x. (0 pts Give the precise definition of the row echelon form. 2. ( 0 pts After performing row reductions on the augmented matrix for a certain system
More information2.4 Hilbert Spaces. Outline
2.4 Hilbert Spaces Tom Lewis Spring Semester 2017 Outline Hilbert spaces L 2 ([a, b]) Orthogonality Approximations Definition A Hilbert space is an inner product space which is complete in the norm defined
More informationMODULE 8 Topics: Null space, range, column space, row space and rank of a matrix
MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix Definition: Let L : V 1 V 2 be a linear operator. The null space N (L) of L is the subspace of V 1 defined by N (L) = {x
More informationAssignment 1 Math 5341 Linear Algebra Review. Give complete answers to each of the following questions. Show all of your work.
Assignment 1 Math 5341 Linear Algebra Review Give complete answers to each of the following questions Show all of your work Note: You might struggle with some of these questions, either because it has
More informationError Empirical error. Generalization error. Time (number of iteration)
Submitted to Neural Networks. Dynamics of Batch Learning in Multilayer Networks { Overrealizability and Overtraining { Kenji Fukumizu The Institute of Physical and Chemical Research (RIKEN) E-mail: fuku@brain.riken.go.jp
More informationLinear Algebra, 4th day, Thursday 7/1/04 REU Info:
Linear Algebra, 4th day, Thursday 7/1/04 REU 004. Info http//people.cs.uchicago.edu/laci/reu04. Instructor Laszlo Babai Scribe Nick Gurski 1 Linear maps We shall study the notion of maps between vector
More informationTechinical Proofs for Nonlinear Learning using Local Coordinate Coding
Techinical Proofs for Nonlinear Learning using Local Coordinate Coding 1 Notations and Main Results Denition 1.1 (Lipschitz Smoothness) A function f(x) on R d is (α, β, p)-lipschitz smooth with respect
More informationLecture 2: Linear Algebra Review
EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1
More informationPseudoinverse & Moore-Penrose Conditions
ECE 275AB Lecture 7 Fall 2008 V1.0 c K. Kreutz-Delgado, UC San Diego p. 1/1 Lecture 7 ECE 275A Pseudoinverse & Moore-Penrose Conditions ECE 275AB Lecture 7 Fall 2008 V1.0 c K. Kreutz-Delgado, UC San Diego
More informationReproducing Kernel Hilbert Spaces
9.520: Statistical Learning Theory and Applications February 10th, 2010 Reproducing Kernel Hilbert Spaces Lecturer: Lorenzo Rosasco Scribe: Greg Durrett 1 Introduction In the previous two lectures, we
More informationEE731 Lecture Notes: Matrix Computations for Signal Processing
EE731 Lecture Notes: Matrix Computations for Signal Processing James P. Reilly c Department of Electrical and Computer Engineering McMaster University September 22, 2005 0 Preface This collection of ten
More informationLecture Notes 1: Vector spaces
Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector
More informationLecture II: Linear Algebra Revisited
Lecture II: Linear Algebra Revisited Overview Vector spaces, Hilbert & Banach Spaces, etrics & Norms atrices, Eigenvalues, Orthogonal Transformations, Singular Values Operators, Operator Norms, Function
More informationECE 275A Homework #3 Solutions
ECE 75A Homework #3 Solutions. Proof of (a). Obviously Ax = 0 y, Ax = 0 for all y. To show sufficiency, note that if y, Ax = 0 for all y, then it must certainly be true for the particular value of y =
More informationLinear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space
Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) Contents 1 Vector Spaces 1 1.1 The Formal Denition of a Vector Space.................................. 1 1.2 Subspaces...................................................
More informationVector Space Basics. 1 Abstract Vector Spaces. 1. (commutativity of vector addition) u + v = v + u. 2. (associativity of vector addition)
Vector Space Basics (Remark: these notes are highly formal and may be a useful reference to some students however I am also posting Ray Heitmann's notes to Canvas for students interested in a direct computational
More informationPlan of Class 4. Radial Basis Functions with moving centers. Projection Pursuit Regression and ridge. Principal Component Analysis: basic ideas
Plan of Class 4 Radial Basis Functions with moving centers Multilayer Perceptrons Projection Pursuit Regression and ridge functions approximation Principal Component Analysis: basic ideas Radial Basis
More informationReview of Linear Algebra
Review of Linear Algebra Definitions An m n (read "m by n") matrix, is a rectangular array of entries, where m is the number of rows and n the number of columns. 2 Definitions (Con t) A is square if m=
More informationElementary 2-Group Character Codes. Abstract. In this correspondence we describe a class of codes over GF (q),
Elementary 2-Group Character Codes Cunsheng Ding 1, David Kohel 2, and San Ling Abstract In this correspondence we describe a class of codes over GF (q), where q is a power of an odd prime. These codes
More informationAlgebra II. Paulius Drungilas and Jonas Jankauskas
Algebra II Paulius Drungilas and Jonas Jankauskas Contents 1. Quadratic forms 3 What is quadratic form? 3 Change of variables. 3 Equivalence of quadratic forms. 4 Canonical form. 4 Normal form. 7 Positive
More information12 CHAPTER 1. PRELIMINARIES Lemma 1.3 (Cauchy-Schwarz inequality) Let (; ) be an inner product in < n. Then for all x; y 2 < n we have j(x; y)j (x; x)
1.4. INNER PRODUCTS,VECTOR NORMS, AND MATRIX NORMS 11 The estimate ^ is unbiased, but E(^ 2 ) = n?1 n 2 and is thus biased. An unbiased estimate is ^ 2 = 1 (x i? ^) 2 : n? 1 In x?? we show that the linear
More informationReview and problem list for Applied Math I
Review and problem list for Applied Math I (This is a first version of a serious review sheet; it may contain errors and it certainly omits a number of topic which were covered in the course. Let me know
More informationThe Family of Regularized Parametric Projection Filters for Digital Image Restoration
IEICE TRANS. FUNDAMENTALS, VOL.E82 A, NO.3 MARCH 1999 527 PAPER The Family of Regularized Parametric Projection Filters for Digital Image Restoration Hideyuki IMAI, Akira TANAKA, and Masaaki MIYAKOSHI,
More informationChapter 3 Least Squares Solution of y = A x 3.1 Introduction We turn to a problem that is dual to the overconstrained estimation problems considered s
Lectures on Dynamic Systems and Control Mohammed Dahleh Munther A. Dahleh George Verghese Department of Electrical Engineering and Computer Science Massachuasetts Institute of Technology 1 1 c Chapter
More informationOrthogonal Projection and Least Squares Prof. Philip Pennance 1 -Version: December 12, 2016
Orthogonal Projection and Least Squares Prof. Philip Pennance 1 -Version: December 12, 2016 1. Let V be a vector space. A linear transformation P : V V is called a projection if it is idempotent. That
More informationRelative Irradiance. Wavelength (nm)
Characterization of Scanner Sensitivity Gaurav Sharma H. J. Trussell Electrical & Computer Engineering Dept. North Carolina State University, Raleigh, NC 7695-79 Abstract Color scanners are becoming quite
More informationOle Christensen 3. October 20, Abstract. We point out some connections between the existing theories for
Frames and pseudo-inverses. Ole Christensen 3 October 20, 1994 Abstract We point out some connections between the existing theories for frames and pseudo-inverses. In particular, using the pseudo-inverse
More informationKernel Method: Data Analysis with Positive Definite Kernels
Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationStatistical Convergence of Kernel CCA
Statistical Convergence of Kernel CCA Kenji Fukumizu Institute of Statistical Mathematics Tokyo 106-8569 Japan fukumizu@ism.ac.jp Francis R. Bach Centre de Morphologie Mathematique Ecole des Mines de Paris,
More informationMath 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.
Math 350 Fall 2011 Notes about inner product spaces In this notes we state and prove some important properties of inner product spaces. First, recall the dot product on R n : if x, y R n, say x = (x 1,...,
More informationLinear Algebra and Robot Modeling
Linear Algebra and Robot Modeling Nathan Ratliff Abstract Linear algebra is fundamental to robot modeling, control, and optimization. This document reviews some of the basic kinematic equations and uses
More informationMATH Linear Algebra
MATH 304 - Linear Algebra In the previous note we learned an important algorithm to produce orthogonal sequences of vectors called the Gramm-Schmidt orthogonalization process. Gramm-Schmidt orthogonalization
More informationReproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto
Reproducing Kernel Hilbert Spaces 9.520 Class 03, 15 February 2006 Andrea Caponnetto About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationNovel determination of dierential-equation solutions: universal approximation method
Journal of Computational and Applied Mathematics 146 (2002) 443 457 www.elsevier.com/locate/cam Novel determination of dierential-equation solutions: universal approximation method Thananchai Leephakpreeda
More informationLinear Algebra (Review) Volker Tresp 2018
Linear Algebra (Review) Volker Tresp 2018 1 Vectors k, M, N are scalars A one-dimensional array c is a column vector. Thus in two dimensions, ( ) c1 c = c 2 c i is the i-th component of c c T = (c 1, c
More informationDS-GA 1002 Lecture notes 10 November 23, Linear models
DS-GA 2 Lecture notes November 23, 2 Linear functions Linear models A linear model encodes the assumption that two quantities are linearly related. Mathematically, this is characterized using linear functions.
More informationPARAMETER IDENTIFICATION IN THE FREQUENCY DOMAIN. H.T. Banks and Yun Wang. Center for Research in Scientic Computation
PARAMETER IDENTIFICATION IN THE FREQUENCY DOMAIN H.T. Banks and Yun Wang Center for Research in Scientic Computation North Carolina State University Raleigh, NC 7695-805 Revised: March 1993 Abstract In
More informationEXERCISE SET 5.1. = (kx + kx + k, ky + ky + k ) = (kx + kx + 1, ky + ky + 1) = ((k + )x + 1, (k + )y + 1)
EXERCISE SET 5. 6. The pair (, 2) is in the set but the pair ( )(, 2) = (, 2) is not because the first component is negative; hence Axiom 6 fails. Axiom 5 also fails. 8. Axioms, 2, 3, 6, 9, and are easily
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 11, 2009 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 12, 2007 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationSTRONG CONVERGENCE THEOREMS BY A HYBRID STEEPEST DESCENT METHOD FOR COUNTABLE NONEXPANSIVE MAPPINGS IN HILBERT SPACES
Scientiae Mathematicae Japonicae Online, e-2008, 557 570 557 STRONG CONVERGENCE THEOREMS BY A HYBRID STEEPEST DESCENT METHOD FOR COUNTABLE NONEXPANSIVE MAPPINGS IN HILBERT SPACES SHIGERU IEMOTO AND WATARU
More informationVectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =
Linear Algebra Review Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1 x x = 2. x n Vectors of up to three dimensions are easy to diagram.
More informationLinear algebra II Homework #1 due Thursday, Feb A =
Homework #1 due Thursday, Feb. 1 1. Find the eigenvalues and the eigenvectors of the matrix [ ] 3 2 A =. 1 6 2. Find the eigenvalues and the eigenvectors of the matrix 3 2 2 A = 2 3 2. 2 2 1 3. The following
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationPreliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012
Instructions Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 The exam consists of four problems, each having multiple parts. You should attempt to solve all four problems. 1.
More informationThe Gram-Schmidt Process 1
The Gram-Schmidt Process In this section all vector spaces will be subspaces of some R m. Definition.. Let S = {v...v n } R m. The set S is said to be orthogonal if v v j = whenever i j. If in addition
More informationLINEAR ALGEBRA REVIEW
LINEAR ALGEBRA REVIEW When we define a term, we put it in boldface. This is a very compressed review; please read it very carefully and be sure to ask questions on parts you aren t sure of. x 1 WedenotethesetofrealnumbersbyR.
More information4.3 - Linear Combinations and Independence of Vectors
- Linear Combinations and Independence of Vectors De nitions, Theorems, and Examples De nition 1 A vector v in a vector space V is called a linear combination of the vectors u 1, u,,u k in V if v can be
More informationMATH 22A: LINEAR ALGEBRA Chapter 4
MATH 22A: LINEAR ALGEBRA Chapter 4 Jesús De Loera, UC Davis November 30, 2012 Orthogonality and Least Squares Approximation QUESTION: Suppose Ax = b has no solution!! Then what to do? Can we find an Approximate
More informationContents. 2.1 Vectors in R n. Linear Algebra (part 2) : Vector Spaces (by Evan Dummit, 2017, v. 2.50) 2 Vector Spaces
Linear Algebra (part 2) : Vector Spaces (by Evan Dummit, 2017, v 250) Contents 2 Vector Spaces 1 21 Vectors in R n 1 22 The Formal Denition of a Vector Space 4 23 Subspaces 6 24 Linear Combinations and
More informationIntertibility and spectrum of the multiplication operator on the space of square-summable sequences
Intertibility and spectrum of the multiplication operator on the space of square-summable sequences Objectives Establish an invertibility criterion and calculate the spectrum of the multiplication operator
More informationQuantum logics with given centres and variable state spaces Mirko Navara 1, Pavel Ptak 2 Abstract We ask which logics with a given centre allow for en
Quantum logics with given centres and variable state spaces Mirko Navara 1, Pavel Ptak 2 Abstract We ask which logics with a given centre allow for enlargements with an arbitrary state space. We show in
More informationNew concepts: Span of a vector set, matrix column space (range) Linearly dependent set of vectors Matrix null space
Lesson 6: Linear independence, matrix column space and null space New concepts: Span of a vector set, matrix column space (range) Linearly dependent set of vectors Matrix null space Two linear systems:
More informationLecture 2: Review of Prerequisites. Table of contents
Math 348 Fall 217 Lecture 2: Review of Prerequisites Disclaimer. As we have a textbook, this lecture note is for guidance and supplement only. It should not be relied on when preparing for exams. In this
More informationAn Integral Representation of Functions using. Three-layered Networks and Their Approximation. Bounds. Noboru Murata 1
An Integral epresentation of Functions using Three-layered Networks and Their Approximation Bounds Noboru Murata Department of Mathematical Engineering and Information Physics, University of Tokyo, Hongo
More informationDesigning Information Devices and Systems II
EECS 16B Fall 2016 Designing Information Devices and Systems II Linear Algebra Notes Introduction In this set of notes, we will derive the linear least squares equation, study the properties symmetric
More informationMTH 2032 SemesterII
MTH 202 SemesterII 2010-11 Linear Algebra Worked Examples Dr. Tony Yee Department of Mathematics and Information Technology The Hong Kong Institute of Education December 28, 2011 ii Contents Table of Contents
More informationChapter 3: Vector Spaces x1: Basic concepts Basic idea: a vector space V is a collection of things you can add together, and multiply by scalars (= nu
Math 314 Topics for second exam Technically, everything covered by the rst exam plus Chapter 2 x6 Determinants (Square) matrices come in two avors: invertible (all Ax = b have a solution) and noninvertible
More informationε ε
The 8th International Conference on Computer Vision, July, Vancouver, Canada, Vol., pp. 86{9. Motion Segmentation by Subspace Separation and Model Selection Kenichi Kanatani Department of Information Technology,
More information1. Subspaces A subset M of Hilbert space H is a subspace of it is closed under the operation of forming linear combinations;i.e.,
Abstract Hilbert Space Results We have learned a little about the Hilbert spaces L U and and we have at least defined H 1 U and the scale of Hilbert spaces H p U. Now we are going to develop additional
More informationMath Real Analysis II
Math 4 - Real Analysis II Solutions to Homework due May Recall that a function f is called even if f( x) = f(x) and called odd if f( x) = f(x) for all x. We saw that these classes of functions had a particularly
More informationBearing fault diagnosis based on EMD-KPCA and ELM
Bearing fault diagnosis based on EMD-KPCA and ELM Zihan Chen, Hang Yuan 2 School of Reliability and Systems Engineering, Beihang University, Beijing 9, China Science and Technology on Reliability & Environmental
More informationSubspace Information Criterion for Model Selection
Neural Computation, vol.13, no.8, pp.1863 1889, 21. 1 Subspace Information Criterion for Model Selection Masashi Sugiyama Hidemitsu Ogawa Department of Computer Science, Graduate School of Information
More informationLinear Algebra, Summer 2011, pt. 3
Linear Algebra, Summer 011, pt. 3 September 0, 011 Contents 1 Orthogonality. 1 1.1 The length of a vector....................... 1. Orthogonal vectors......................... 3 1.3 Orthogonal Subspaces.......................
More informationDS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.
DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1
More informationOctober 25, 2013 INNER PRODUCT SPACES
October 25, 2013 INNER PRODUCT SPACES RODICA D. COSTIN Contents 1. Inner product 2 1.1. Inner product 2 1.2. Inner product spaces 4 2. Orthogonal bases 5 2.1. Existence of an orthogonal basis 7 2.2. Orthogonal
More informationOld painting digital color restoration
Old painting digital color restoration Michail Pappas Ioannis Pitas Dept. of Informatics, Aristotle University of Thessaloniki GR-54643 Thessaloniki, Greece Abstract Many old paintings suffer from the
More informationSeminar on Linear Algebra
Supplement Seminar on Linear Algebra Projection, Singular Value Decomposition, Pseudoinverse Kenichi Kanatani Kyoritsu Shuppan Co., Ltd. Contents 1 Linear Space and Projection 1 1.1 Expression of Linear
More informationDavid Hilbert was old and partly deaf in the nineteen thirties. Yet being a diligent
Chapter 5 ddddd dddddd dddddddd ddddddd dddddddd ddddddd Hilbert Space The Euclidean norm is special among all norms defined in R n for being induced by the Euclidean inner product (the dot product). A
More informationTORWARDS A GENERAL FORMULATION FOR OVER-SAMPLING AND UNDER-SAMPLING
TORWARDS A GEERAL FORMULATIO FOR OVER-SAMPLIG AD UDER-SAMPLIG Aira Hirabayashi 1 and Laurent Condat 2 1 Dept. of Information Science and Engineering, Yamaguchi University, 2-16-1, Toiwadai, Ube 755-8611,
More informationGaussian Process Regression: Active Data Selection and Test Point. Rejection. Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer
Gaussian Process Regression: Active Data Selection and Test Point Rejection Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer Department of Computer Science, Technical University of Berlin Franklinstr.8,
More informationLECTURE 7. k=1 (, v k)u k. Moreover r
LECTURE 7 Finite rank operators Definition. T is said to be of rank r (r < ) if dim T(H) = r. The class of operators of rank r is denoted by K r and K := r K r. Theorem 1. T K r iff T K r. Proof. Let T
More informationTHE PROBLEMS OF ROBUST LPC PARAMETRIZATION FOR. Petr Pollak & Pavel Sovka. Czech Technical University of Prague
THE PROBLEMS OF ROBUST LPC PARAMETRIZATION FOR SPEECH CODING Petr Polla & Pavel Sova Czech Technical University of Prague CVUT FEL K, 66 7 Praha 6, Czech Republic E-mail: polla@noel.feld.cvut.cz Abstract
More informationMath Linear Algebra II. 1. Inner Products and Norms
Math 342 - Linear Algebra II Notes 1. Inner Products and Norms One knows from a basic introduction to vectors in R n Math 254 at OSU) that the length of a vector x = x 1 x 2... x n ) T R n, denoted x,
More information5 and A,1 = B = is obtained by interchanging the rst two rows of A. Write down the inverse of B.
EE { QUESTION LIST EE KUMAR Spring (we will use the abbreviation QL to refer to problems on this list the list includes questions from prior midterm and nal exams) VECTORS AND MATRICES. Pages - of the
More informationwhich arises when we compute the orthogonal projection of a vector y in a subspace with an orthogonal basis. Hence assume that P y = A ij = x j, x i
MODULE 6 Topics: Gram-Schmidt orthogonalization process We begin by observing that if the vectors {x j } N are mutually orthogonal in an inner product space V then they are necessarily linearly independent.
More informationPh 219/CS 219. Exercises Due: Friday 20 October 2006
1 Ph 219/CS 219 Exercises Due: Friday 20 October 2006 1.1 How far apart are two quantum states? Consider two quantum states described by density operators ρ and ρ in an N-dimensional Hilbert space, and
More informationLinear Algebra (Review) Volker Tresp 2017
Linear Algebra (Review) Volker Tresp 2017 1 Vectors k is a scalar (a number) c is a column vector. Thus in two dimensions, c = ( c1 c 2 ) (Advanced: More precisely, a vector is defined in a vector space.
More informationHilbert Spaces: Infinite-Dimensional Vector Spaces
Hilbert Spaces: Infinite-Dimensional Vector Spaces PHYS 500 - Southern Illinois University October 27, 2016 PHYS 500 - Southern Illinois University Hilbert Spaces: Infinite-Dimensional Vector Spaces October
More information1. The Polar Decomposition
A PERSONAL INTERVIEW WITH THE SINGULAR VALUE DECOMPOSITION MATAN GAVISH Part. Theory. The Polar Decomposition In what follows, F denotes either R or C. The vector space F n is an inner product space with
More informationKernels for Multi task Learning
Kernels for Multi task Learning Charles A Micchelli Department of Mathematics and Statistics State University of New York, The University at Albany 1400 Washington Avenue, Albany, NY, 12222, USA Massimiliano
More informationExercises * on Linear Algebra
Exercises * on Linear Algebra Laurenz Wiskott Institut für Neuroinformatik Ruhr-Universität Bochum, Germany, EU 4 February 7 Contents Vector spaces 4. Definition...............................................
More informationMatrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =
30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can
More informationOptimum Sampling Vectors for Wiener Filter Noise Reduction
58 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 50, NO. 1, JANUARY 2002 Optimum Sampling Vectors for Wiener Filter Noise Reduction Yukihiko Yamashita, Member, IEEE Absact Sampling is a very important and
More informationGeneral Inner Product & Fourier Series
General Inner Products 1 General Inner Product & Fourier Series Advanced Topics in Linear Algebra, Spring 2014 Cameron Braithwaite 1 General Inner Product The inner product is an algebraic operation that
More informationPart 1a: Inner product, Orthogonality, Vector/Matrix norm
Part 1a: Inner product, Orthogonality, Vector/Matrix norm September 19, 2018 Numerical Linear Algebra Part 1a September 19, 2018 1 / 16 1. Inner product on a linear space V over the number field F A map,
More informationMath113: Linear Algebra. Beifang Chen
Math3: Linear Algebra Beifang Chen Spring 26 Contents Systems of Linear Equations 3 Systems of Linear Equations 3 Linear Systems 3 2 Geometric Interpretation 3 3 Matrices of Linear Systems 4 4 Elementary
More informationBasic Elements of Linear Algebra
A Basic Review of Linear Algebra Nick West nickwest@stanfordedu September 16, 2010 Part I Basic Elements of Linear Algebra Although the subject of linear algebra is much broader than just vectors and matrices,
More information(v, w) = arccos( < v, w >
MA322 Sathaye Notes on Inner Products Notes on Chapter 6 Inner product. Given a real vector space V, an inner product is defined to be a bilinear map F : V V R such that the following holds: For all v
More informationADJOINTS, ABSOLUTE VALUES AND POLAR DECOMPOSITIONS
J. OPERATOR THEORY 44(2000), 243 254 c Copyright by Theta, 2000 ADJOINTS, ABSOLUTE VALUES AND POLAR DECOMPOSITIONS DOUGLAS BRIDGES, FRED RICHMAN and PETER SCHUSTER Communicated by William B. Arveson Abstract.
More information= w 2. w 1. B j. A j. C + j1j2
Local Minima and Plateaus in Multilayer Neural Networks Kenji Fukumizu and Shun-ichi Amari Brain Science Institute, RIKEN Hirosawa 2-, Wako, Saitama 35-098, Japan E-mail: ffuku, amarig@brain.riken.go.jp
More informationLecture 4 February 2
4-1 EECS 281B / STAT 241B: Advanced Topics in Statistical Learning Spring 29 Lecture 4 February 2 Lecturer: Martin Wainwright Scribe: Luqman Hodgkinson Note: These lecture notes are still rough, and have
More informationusing the Hamiltonian constellations from the packing theory, i.e., the optimal sphere packing points. However, in [11] it is shown that the upper bou
Some 2 2 Unitary Space-Time Codes from Sphere Packing Theory with Optimal Diversity Product of Code Size 6 Haiquan Wang Genyuan Wang Xiang-Gen Xia Abstract In this correspondence, we propose some new designs
More information1 Linear Algebra Problems
Linear Algebra Problems. Let A be the conjugate transpose of the complex matrix A; i.e., A = A t : A is said to be Hermitian if A = A; real symmetric if A is real and A t = A; skew-hermitian if A = A and
More information