w 1 output input &N 1 x w n w N =&2

Size: px
Start display at page:

Download "w 1 output input &N 1 x w n w N =&2"

Transcription

1 ISSN Technical Report L Noise Suppression in Training Data for Improving Generalization Akiko Nakashima, Akira Hirabayashi, and Hidemitsu OGAWA TR96-9 November Department of Computer Science Tokyo Institute of Technology ^Ookayama 2-2- Meguro Tokyo 52, Japan cthe author(s) of this report reserves all the rights. To appear in IEEE International Joint Conference on Neural Networks '98.

2 Noise Suppression in Training Data for Improving Generalization Akiko Nakashima, Akira Hirabayashi and Hidemitsu Ogawa Dept. of Computer Science, Tokyo Institute of Technology, 2-2-, O-okayama, Meguro-ku, Tokyo-52, Japan. Abstract Multi-layer feedforward neural networks are trained using the error back-propagation(bp) algorithm. This algorithm minimizes the error between outputs of a neural network(nn) and training data. Hence, in the case of noisy training data, a trained network memorizes noisy outputs for given inputs. Such learning is called rote memorization learning(rml). In this paper we propose error correcting memorization learning(cml). It can suppress noise in training data. In order to evaluate generalization ability of CML, it is compared with the projection learning (PL) criterion. It is theoretically proved that although CML merely suppresses noise in training data, it provides the same generalization as PL under some necessary and sucient condition. Introduction The learning problem of feed-forward neural networks is considered from the functional analytic point of view with noisy training data. What is important in the learning problem is to achieve a high level of generalization, that is, to construct a neural network which outputs true values not only for training inputs, but also for novel inputs. The back-propagation algorithm is often used for training a neural network. It is derived from a criterion of socalled rote memorization learning(rml), which minimizes the error between outputs of a neural network and noisy training data. Hence, RML does not guarantee generalization ability. In order to solve the problem, a regularization method was proposed[5],[2]. However, it still uses the criterion of RML together with a term of smoothness. In this paper, we propose error correcting memorization learning(cml) to suppress noise in training data. Generalization ability of CML is evaluated by comparing it with projection learning(pl) which reduces errors in the original function space. We obtain a necessary and sucient condition under which CML not only suppresses noise in the training data but also improve generalization. It is known that RML also provides the same generalization as PL under some condition. Although an analytical solution was provided for this condition in [8], here, we use the results on CML to interpret and clarify the above solution. 2 Neural network learning as an inverse problem In this section, we shall present a brief review of the basic formalization necessary for discussing the learning problem in NNs from the functional analytic point of view. Let us begin by considering a three-layer feedforward neural network whose number of input, hidden, and output units are L, N, and, respectively as shown in Fig.. Let x be the L dimensional vector consisting of L inputs i(i = ; ; L), which is referred to as the input vector. The network can be considered as a real valued function f(x) of L variables. input &N x &N L u N u n u w w n w N &2 output f ( x) = A ( J) y =&2 N w n n = Figure: NN as a real valued function u n ( x) The learning problem is to construct a neural network by using a set of training data so that the NN expresses the best approximation f(x) to a desired function f(x) under some learning criterion. We dene some of the notations used here. A training set given as a set of M input vec- fxmg M m= : tors.

3 fy m g M m= : The corresponding noisy output values, where y m = f(x m ) + n m. fx m ; y m g M m= : A set of training data. Once a training set fx m g M m= is xed, the corresponding true outputs ff(x m )g M m= are uniquely determined from f. Hence, we can introduce an operator A which maps f to the vector consisting of ff(x m )g M m=. Let y and n be the M-dimensional vectors consisting of elements fy m g M m= and fn mg M m=, respectively. Then we have y = Af + n: () The operator A is called the sampling operator. It becomes a linear operator even when we are concerned with nonlinear NNs. Let H be the set of all functions f to be approximated by the neural networks. Assume that H is a Hilbert space with a reproducing kernel K(x; x ). Let D be the domain of functions f, which is a subset of the L-dimensional Euclidean space R L. The reproducing kernel K(x; x ) is a bivariate function dened on D 2 D which satises the following two conditions:. For any xed x in D, K(x; x ) is a function in H. 2. For any f in H and x in D, it holds that (f(x); K(x; x )) = f(x ); (2) where the left hand side of eq.(2) denotes the inner product in H. In the theory of Hilbert space, arguments are developed by regarding a function as a point in that space. Thus, things such as 'value of a function at a point' cannot be discussed under the general framework of Hilbert space. However, if the Hilbert space has a reproducing kernel, then it is possible to deal with the value of a function at a point as shown in eq.(2). The sampling operator A is expressed by the reproducing kernel as A = X M m= e m K(x; x m ); (3) where fe m g M m= is the so-called natural basis in R M, i.e., e m is the M-dimensional vector consisting of zero elements except the m-th element equal to. The notation (: :) is the Schatten product dened by (e m g)f = (f; g)e m : (4) Now the learning problem is the problem of obtaining an estimate, say f, to f from y in the model. This can be considered as an inverse problem [4] equivalent to obtaining an operator X which provides f from y: f = Xy: (5) The operator X is called the learning operator. It can be optimized based on dierent learning criteria[4]. We denotes a criterion by J in general, and the operator X satisfying J by A (J). 3 Rote memorization learning The BP method minimizes the training error, that is, MX (f(x m ) y m ) 2 : (6) m= Hence, the learning criterion for the BP method is as follows. Denition (Rote memorization learning) If an operator X minimizes the functional J RM [X] = kaxy yk 2 ; (7) X is called the rote memorization learning(rml) operator and denoted by A (RM), where k k is a norm in R M. A general form of the RML operator is given as A (RM) = A y + Y A y AY; (8) where A y is the Moore-Penrose generalized inverse of A[] and Y is an arbitrary operator from R M to H. J RM requires only to memorize the given noisy training data by rote. 4 Error correcting memorization learning When we expect a NN to output the correct values for given inputs, the mean squared error between outputs of a NN and correct values of noisy training data should be minimized. The error is expressed by MX E[ (f(x m ) f(x m )) 2 ] = E[kAf Af k 2 ]; (9) n m= n where En denotes the expectation over the noise ensemble fng. Using eqs.() and (5), we are able to decompose the rst term in the right hand side of eq.(9) as Af = AXAf + AXn: () The rst and second terms in the right hand side of eq.() denote the signal component and the noise 2

4 H R M N(A) f R(U) A (CM) R(A) QR(A) f A y n R(A*) + + V A*U y R(A) Af Af Figure 2: Learned functions & mechanism of noise suppression. component of Af, respectively. The former is deterministic, whereas the latter is probabilistic in nature. Therefore, we require that the signal component of Af agrees with the true values Af of the training data. It leads us to the concept of error correcting memorization learning as follows. Denition 2 (Error Correcting Memorization Learning) For any f given by eqs.(5) and (), if an operator X minimizes the functional under the constraint J CM [X] = E n [kaf Af k 2 ] () AXA = A; (2) X is called the error correcting memorization learning(cml) operator and denoted by A (CM). Eq.(3) is determined by operators A,Q, and Y. A is obtained from a training set as shown in eq.(3). Q is the correlation matrix, which is determined by the nature of noise. Note that noise is not limited to, for example, a normal distribution or a mean-zero distribution. Hence, we can apply the theorem to any type of noise as long as Q is estimated. Statistically, almost all noisy training data y lie in the range of U denoted by R(U)[6]. It has the following (generally nonorthogonal) direct sum decomposition. R(U) = R(A) _+QR(A)? ; (6) where R(A)? is the orthogonal complement of R(A). This decomposition yields the following result. Theorem 2 An operator X satises the CML criterion if and only if Theorem A general form of the CML operator is given as y : y 2 R(A) AXy = : y 2 QR(A)? : (7) A (CM) = V y A 3 U y + Y A y AY U U y ; (3) where A 3 is the adjoint operator of A. U and V is dened as U = AA 3 + Q; V = A 3 U y A; (4) where Q is the correlation matrix of noise and Y is an arbitrary operator from R M to H. The minimum value of J CM [X] is given by min X J CM [X] = J CM [A (CM) ] = tr(av y A 3 ) tr(aa 3 ): (5) The correlation matrix of noise is the M 2M matrix dened by Q En(n n) whose ij-th component is En(ninj). Theorem 2 shows a mechanism of noise suppression by CML, which is illustrated in Fig.2. Let us consider a vector y in R(U). It is decomposed using eq.(6) as y = Af + n + n 2 ; (8) where n and n 2 are the R(A)-component and the QR(A)? -component of n, respectively. For the vector y, Af = AA (CM) y = Af + n ; (9) which is the R(A)-component of y. CML removes the QR(A)? -component n 2 of n. In this sense QR(A)? 3

5 5 5 4 *: training data solid: original function dashed: function learned by CML dotted: function learned by RML 4 *: training data solid: original function dashed: function learned by CML dotted: function learned by RML x Figure 3: Functions learned by CML and RML with the training set fx m g 3 m= = f:8; :4; g x Figure 4: Functions learned by CML and RML with the training set fx m g 3 m= = f; ; g. is the subspace which results in optimal noise suppression. Next, we show results of CML on an articial problem. Let H be a 2-dimensional functional space spanned by f' n (x)g 2 n= = fsinx; cosxg (2) and the inner product in H be dened as (f; g) = Z f(x)g(x)dx : f; g 2 H: (2) Then f'; '2g in eq.(2) becomes an orthonormal basis of H. The reproducing kernel of H is given by 2X K(x; x ) = ' n (x)' n (x ) (22) n= = cos(x x ): (23) Let us now consider the problem of learning a function f(x) = :5sinx + 3cosx (24) within this function space H. We consider two dierent experiments. The rst (Fig.3) uses 3 training points, fx m g 3 m= = f:8; :4; g, while the second (Fig.4) uses a dierent set of 3 training points, fx m g 3 m= = f; ; g. Here, we assume that noise is generated from the three-dimensional normal distribution with diagonal covariance matrix diag(:3 2 ; :2 2 ; : 2 ) and mean En n = (:8; :; :5) t. Then, the noise correlation matrix is given by Q :73 :8 :2 :8 :4 :5 :2 :5 2:26 A : (25) In Fig.3 and Fig.4, the true sampled values ff(x m )g 3 m= and the noisy training data y are denoted by 'o' and '3', respectively. The original function f is denoted by a solid line while the learned functions f by CML and RML are shown by a dashed line and a dotted line, respectively. As for the training error, both experiments give the similar results. The learned function f by CML passes near the true points denoted by 'o', whereas f by RML passes near the noisy data points denoted by '3'. This example shows that CML certainly suppresses noise in the training data. As for the generalization, experimental results are somewhat dierent. CML in Fig.3 provides the better approximation than CML in Fig.4. This result shows that the generalization by CML depends on the training set fx m g 3 m=. 5 Admissibility CML evaluates the error only over the training set. Therefore, even when a network is successfully trained for given inputs by CML, it is not guaranteed that f provides desired output values for novel inputs as shown in Fig.4. When we expect CML to achieve higher generalization, we implicitly uses the CML criterion as a substitute for some true criterion J which directly estimates the generalization error. In order to discuss conditions when we can substitute J CM for J, the concept of admissibility is useful[7]. Consider a general case that a criterion J substitutes for J. Generally, there are many learning operators which satisfy J. A set of them is denoted by AfJg. 4

6 Denition 3 (Admissibility)[7] (i) (Non admissibility) If all J -learnings do not satisfy J, i.e.,if it holds that AfJg \ AfJ g = ; (26) then it is said that J does not admit J. (ii) (Partial admissibility) If there is at least one J - learning which satises J, i.e., if it holds that AfJg \ AfJ g 6= ; (27) then it is said that J partially admits J. (iii) (Admissibility) If all J -learning satisfy J, i.e., if it holds that AfJg AfJ g; (28) then it is said that J always admits J, in brief J admits J. (iv) (Complete admissibility) If J always admits J and vice versa, i.e., if it holds that AfJg = AfJ g; (29) then it is said that J completely admits J. (v) (Inverse admissibility) If all J-learning satisfy J, i.e., if it holds that AfJg AfJ g; (3) then it is said that J is always admitted by J. Eq.(28) means that J is sucient for J, while eq.(3) means that J is necessary for J. Based on the concept of admissibility we shall discuss generalization ability of CML in the next section. 6 Generalization ability of!!!! CML In this section, as an example of true criterion J, we shall consider projection learning(pl)[3]. Let P be the orthogonal projection operator onto R(A 3 ). Denition 4 (Projection learning)[3] For any f given by eqs.(5) and (), if an operator X minimizes the functional J P [X] = E n [kf P f k 2 ] (3) under the constraint XA = P; (32) X is called the projection learning (PL) operator and denoted by A (P ), where k k is a norm in H. Whenever we use a linear operator X for constructing f, the range of X becomes a subspace of H. Hence 'the best approximation' implies that f is the nearest point to f in the subspace R(X), i.e., the orthogonal projection of f onto R(X). R(A 3 ) is the largest subspace in which we can obtain the orthogonal projection of f from y without knowing the original f. That is the reason why in eq.(3) error is evaluated not between f and f but between f and P f. Eqs.(5) and () yield that f = XAf + Xn: (33) The rst term XAf in the right hand side of eq.(33) is the signal component of f. It is independent of n in y. Hence, it is required that the signal component of f agrees with the best approximation of f in R(A 3 ), which is represented by the soft condition in the constrained optimization of eq.(32). Let us consider the case that the CML criterion J CM is used as a substitute for the PL criterion J P. In this case, the following two kinds of admissibilities appear among ve, listed in Section 5. Theorem 3 (Inverse admissibility) All the PL operators satisfy the CML criterion i.e., it always holds that AfJ P g AfJ CM g: (34) Theorem 4 (Complete admissibility) The PL criterion completely admits the CML criterion, i.e., it holds that AfJ P g = AfJ CM g (35) if and only if or N (A) = fg (36) N (A) = H!$R(Q) = fg; (37) where N (A) represents the null space of A. Theorem 3 says that any projection learning operator A (P ) can always suppress noise in training data. Theorem 3 and Theorem 4 show that there are A (CM) s which do not satisfy J P in general. Hence, noise suppression in the training data is not enough for CML to obtain the same generalization as that of PL; additionally eq.(36) or eq.(37) has to be satised. N (A) is the subspace consisting of functions which are mapped to zero vector by the sampling operator A. Eq.(37) refers to the condition when all the training data is statistically always zero, which does not make sense in practical learning problems. Therefore, eq.(36) is more essential for complete admissibility. When eq.(36) does not hold, the situation is as follows. In Fig.2, N (A), which has many non-zero functions, is denoted by a line perpendicular to R(A 3 ). 5

7 From eq.(3), the function learned by CML from y in R(U) is given as f = V y A 3 U y y + f; (38) where f is an arbitrary function in N (A). The rst term in the right hand side of eq.(38) is the function obtained by PL from the same y. Hence, the generalization error of the function in eq.(38) depends on the selection of f. When eq.(36) holds, N (A) only consists of zero function. Then, the function in eq.(38) beomes equal to the function obtained by PL. Whether eq.(36) holds or not depends on a training set fxmg M m=, because the sampling operator A is determined by the training set as shown in eq.(3). Hence, the generalization by CML depends on the selection of the training set. The training set used in Fig.3 satises eq.(36), whereas the training set used in Fig.4 does not satisfy eq.(36). This dierence causes the variation in generalization in Fig.3 and Fig.4. 7 Generalization ability of!!!! RML Although RML does not consider suppression of noise, it provides the same generalization as PL under some conditions. In this section we shall interpret the condition by using the results on CML. Since statistically almost all y lie in R(U), let us consider RML with y limited to R(U). Complete admissibility holds for CML if eq(36) is satised; while an additional condition QR(A)? R(A)? (39) is necessary for RML. The left hand side of eq.(39) is the subspace which results in optimal noise suppression in the sense of CML as shown by Theorem 2. RML also suppresses noise to some extent, although it seems contradictory to the RML criterion. RML constructs f so that Af becomes the best approximation to noisy y. Af belongs to R(A), even if y does not belong to R(A) in general because of noise. Hence, the best approximation to y is the orthogonal projection of y onto R(A). As a result a component of noise in R(A)? is removed independently of the nature of noise. When R(A)? includes QR(A)? as shown in eq.(39), for any y in R(U) a component in R(A)? becomes equal to the component in QR(A)?. Then, RML can suppress as much noise in the training data as CML. 8 Conclusions We proposed error correcting memorization learning. It can suppress noise in training data using the noise correlation matrix. By comparing the generalization ability of CML with that of PL, we obtained a necessary and sucient condition under which CML provides better generalization. In the case of RML, a further condition is necessary. We interpreted the meaning of the additional condition using the results on CML. Acknowledgements We would like to thank Mr. S.Vijayakumar for fruitful discussions. This work was supported by the Grant-in- Aid for Scientic Research ] and ]4429. References [] A.Albert,Regression and the Moore-Penrose Pseudoinverse, Academic Press(972). [2] C.M.Bishop,\Improving the generalization properties of radial basis function networks", Neural Computation, vol.3,no.4, pp.579{588(winter 99). [3] H.Ogawa,\Projection lter regularization of illconditioned problem", SPIE, Inverse Problems in Optics, vol.88, pp.89{96 (987). [4] H.Ogawa,\Neural network learning, generalization and over-learning", Proc. ICIIPS'92 (Beijing),Oct.3 - Nov.,992, vol.2, pp.-6(992). [5] T.Poggio and F.Girosi,\Networks for approximation and learning", Proc. of the IEEE, vol.78, no.9, pp (Sep. 99). [6] Y.Yamashita and H.Ogawa,\Properties of averaged projection lter for image restoration",trans. IEICE,Japan, vol.j74-d-ii, no.2, pp (Feb. 99)(In Japanese). [7] H.Ogawa and Y.Yamasaki,\A theory of overlearning", Trans. IEICE, Japan, vol.j76-d-ii, no.7, pp (June 993)(In Japanese); Its English and short version appeared in Articial Neural Networks 2, vol., I.Aleksander and J.Taylor, Eds., North-Holland, pp.25{28(992). [8] A.Hirabayashi and H.Ogawa,\Admissibility of memorization learning with respect to projection learning in the presence of noise", ICNN'96 (Washington D.C.), Jun.3-6, 996, vol., pp (jun. 996). 6

Approximating the Best Linear Unbiased Estimator of Non-Gaussian Signals with Gaussian Noise

Approximating the Best Linear Unbiased Estimator of Non-Gaussian Signals with Gaussian Noise IEICE Transactions on Information and Systems, vol.e91-d, no.5, pp.1577-1580, 2008. 1 Approximating the Best Linear Unbiased Estimator of Non-Gaussian Signals with Gaussian Noise Masashi Sugiyama (sugi@cs.titech.ac.jp)

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice 3. Eigenvalues and Eigenvectors, Spectral Representation 3.. Eigenvalues and Eigenvectors A vector ' is eigenvector of a matrix K, if K' is parallel to ' and ' 6, i.e., K' k' k is the eigenvalue. If is

More information

below, kernel PCA Eigenvectors, and linear combinations thereof. For the cases where the pre-image does exist, we can provide a means of constructing

below, kernel PCA Eigenvectors, and linear combinations thereof. For the cases where the pre-image does exist, we can provide a means of constructing Kernel PCA Pattern Reconstruction via Approximate Pre-Images Bernhard Scholkopf, Sebastian Mika, Alex Smola, Gunnar Ratsch, & Klaus-Robert Muller GMD FIRST, Rudower Chaussee 5, 12489 Berlin, Germany fbs,

More information

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis. Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar

More information

MAT Linear Algebra Collection of sample exams

MAT Linear Algebra Collection of sample exams MAT 342 - Linear Algebra Collection of sample exams A-x. (0 pts Give the precise definition of the row echelon form. 2. ( 0 pts After performing row reductions on the augmented matrix for a certain system

More information

2.4 Hilbert Spaces. Outline

2.4 Hilbert Spaces. Outline 2.4 Hilbert Spaces Tom Lewis Spring Semester 2017 Outline Hilbert spaces L 2 ([a, b]) Orthogonality Approximations Definition A Hilbert space is an inner product space which is complete in the norm defined

More information

MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix

MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix Definition: Let L : V 1 V 2 be a linear operator. The null space N (L) of L is the subspace of V 1 defined by N (L) = {x

More information

Assignment 1 Math 5341 Linear Algebra Review. Give complete answers to each of the following questions. Show all of your work.

Assignment 1 Math 5341 Linear Algebra Review. Give complete answers to each of the following questions. Show all of your work. Assignment 1 Math 5341 Linear Algebra Review Give complete answers to each of the following questions Show all of your work Note: You might struggle with some of these questions, either because it has

More information

Error Empirical error. Generalization error. Time (number of iteration)

Error Empirical error. Generalization error. Time (number of iteration) Submitted to Neural Networks. Dynamics of Batch Learning in Multilayer Networks { Overrealizability and Overtraining { Kenji Fukumizu The Institute of Physical and Chemical Research (RIKEN) E-mail: fuku@brain.riken.go.jp

More information

Linear Algebra, 4th day, Thursday 7/1/04 REU Info:

Linear Algebra, 4th day, Thursday 7/1/04 REU Info: Linear Algebra, 4th day, Thursday 7/1/04 REU 004. Info http//people.cs.uchicago.edu/laci/reu04. Instructor Laszlo Babai Scribe Nick Gurski 1 Linear maps We shall study the notion of maps between vector

More information

Techinical Proofs for Nonlinear Learning using Local Coordinate Coding

Techinical Proofs for Nonlinear Learning using Local Coordinate Coding Techinical Proofs for Nonlinear Learning using Local Coordinate Coding 1 Notations and Main Results Denition 1.1 (Lipschitz Smoothness) A function f(x) on R d is (α, β, p)-lipschitz smooth with respect

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

Pseudoinverse & Moore-Penrose Conditions

Pseudoinverse & Moore-Penrose Conditions ECE 275AB Lecture 7 Fall 2008 V1.0 c K. Kreutz-Delgado, UC San Diego p. 1/1 Lecture 7 ECE 275A Pseudoinverse & Moore-Penrose Conditions ECE 275AB Lecture 7 Fall 2008 V1.0 c K. Kreutz-Delgado, UC San Diego

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces 9.520: Statistical Learning Theory and Applications February 10th, 2010 Reproducing Kernel Hilbert Spaces Lecturer: Lorenzo Rosasco Scribe: Greg Durrett 1 Introduction In the previous two lectures, we

More information

EE731 Lecture Notes: Matrix Computations for Signal Processing

EE731 Lecture Notes: Matrix Computations for Signal Processing EE731 Lecture Notes: Matrix Computations for Signal Processing James P. Reilly c Department of Electrical and Computer Engineering McMaster University September 22, 2005 0 Preface This collection of ten

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Lecture II: Linear Algebra Revisited

Lecture II: Linear Algebra Revisited Lecture II: Linear Algebra Revisited Overview Vector spaces, Hilbert & Banach Spaces, etrics & Norms atrices, Eigenvalues, Orthogonal Transformations, Singular Values Operators, Operator Norms, Function

More information

ECE 275A Homework #3 Solutions

ECE 275A Homework #3 Solutions ECE 75A Homework #3 Solutions. Proof of (a). Obviously Ax = 0 y, Ax = 0 for all y. To show sufficiency, note that if y, Ax = 0 for all y, then it must certainly be true for the particular value of y =

More information

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) Contents 1 Vector Spaces 1 1.1 The Formal Denition of a Vector Space.................................. 1 1.2 Subspaces...................................................

More information

Vector Space Basics. 1 Abstract Vector Spaces. 1. (commutativity of vector addition) u + v = v + u. 2. (associativity of vector addition)

Vector Space Basics. 1 Abstract Vector Spaces. 1. (commutativity of vector addition) u + v = v + u. 2. (associativity of vector addition) Vector Space Basics (Remark: these notes are highly formal and may be a useful reference to some students however I am also posting Ray Heitmann's notes to Canvas for students interested in a direct computational

More information

Plan of Class 4. Radial Basis Functions with moving centers. Projection Pursuit Regression and ridge. Principal Component Analysis: basic ideas

Plan of Class 4. Radial Basis Functions with moving centers. Projection Pursuit Regression and ridge. Principal Component Analysis: basic ideas Plan of Class 4 Radial Basis Functions with moving centers Multilayer Perceptrons Projection Pursuit Regression and ridge functions approximation Principal Component Analysis: basic ideas Radial Basis

More information

Review of Linear Algebra

Review of Linear Algebra Review of Linear Algebra Definitions An m n (read "m by n") matrix, is a rectangular array of entries, where m is the number of rows and n the number of columns. 2 Definitions (Con t) A is square if m=

More information

Elementary 2-Group Character Codes. Abstract. In this correspondence we describe a class of codes over GF (q),

Elementary 2-Group Character Codes. Abstract. In this correspondence we describe a class of codes over GF (q), Elementary 2-Group Character Codes Cunsheng Ding 1, David Kohel 2, and San Ling Abstract In this correspondence we describe a class of codes over GF (q), where q is a power of an odd prime. These codes

More information

Algebra II. Paulius Drungilas and Jonas Jankauskas

Algebra II. Paulius Drungilas and Jonas Jankauskas Algebra II Paulius Drungilas and Jonas Jankauskas Contents 1. Quadratic forms 3 What is quadratic form? 3 Change of variables. 3 Equivalence of quadratic forms. 4 Canonical form. 4 Normal form. 7 Positive

More information

12 CHAPTER 1. PRELIMINARIES Lemma 1.3 (Cauchy-Schwarz inequality) Let (; ) be an inner product in < n. Then for all x; y 2 < n we have j(x; y)j (x; x)

12 CHAPTER 1. PRELIMINARIES Lemma 1.3 (Cauchy-Schwarz inequality) Let (; ) be an inner product in < n. Then for all x; y 2 < n we have j(x; y)j (x; x) 1.4. INNER PRODUCTS,VECTOR NORMS, AND MATRIX NORMS 11 The estimate ^ is unbiased, but E(^ 2 ) = n?1 n 2 and is thus biased. An unbiased estimate is ^ 2 = 1 (x i? ^) 2 : n? 1 In x?? we show that the linear

More information

Review and problem list for Applied Math I

Review and problem list for Applied Math I Review and problem list for Applied Math I (This is a first version of a serious review sheet; it may contain errors and it certainly omits a number of topic which were covered in the course. Let me know

More information

The Family of Regularized Parametric Projection Filters for Digital Image Restoration

The Family of Regularized Parametric Projection Filters for Digital Image Restoration IEICE TRANS. FUNDAMENTALS, VOL.E82 A, NO.3 MARCH 1999 527 PAPER The Family of Regularized Parametric Projection Filters for Digital Image Restoration Hideyuki IMAI, Akira TANAKA, and Masaaki MIYAKOSHI,

More information

Chapter 3 Least Squares Solution of y = A x 3.1 Introduction We turn to a problem that is dual to the overconstrained estimation problems considered s

Chapter 3 Least Squares Solution of y = A x 3.1 Introduction We turn to a problem that is dual to the overconstrained estimation problems considered s Lectures on Dynamic Systems and Control Mohammed Dahleh Munther A. Dahleh George Verghese Department of Electrical Engineering and Computer Science Massachuasetts Institute of Technology 1 1 c Chapter

More information

Orthogonal Projection and Least Squares Prof. Philip Pennance 1 -Version: December 12, 2016

Orthogonal Projection and Least Squares Prof. Philip Pennance 1 -Version: December 12, 2016 Orthogonal Projection and Least Squares Prof. Philip Pennance 1 -Version: December 12, 2016 1. Let V be a vector space. A linear transformation P : V V is called a projection if it is idempotent. That

More information

Relative Irradiance. Wavelength (nm)

Relative Irradiance. Wavelength (nm) Characterization of Scanner Sensitivity Gaurav Sharma H. J. Trussell Electrical & Computer Engineering Dept. North Carolina State University, Raleigh, NC 7695-79 Abstract Color scanners are becoming quite

More information

Ole Christensen 3. October 20, Abstract. We point out some connections between the existing theories for

Ole Christensen 3. October 20, Abstract. We point out some connections between the existing theories for Frames and pseudo-inverses. Ole Christensen 3 October 20, 1994 Abstract We point out some connections between the existing theories for frames and pseudo-inverses. In particular, using the pseudo-inverse

More information

Kernel Method: Data Analysis with Positive Definite Kernels

Kernel Method: Data Analysis with Positive Definite Kernels Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Statistical Convergence of Kernel CCA

Statistical Convergence of Kernel CCA Statistical Convergence of Kernel CCA Kenji Fukumizu Institute of Statistical Mathematics Tokyo 106-8569 Japan fukumizu@ism.ac.jp Francis R. Bach Centre de Morphologie Mathematique Ecole des Mines de Paris,

More information

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces. Math 350 Fall 2011 Notes about inner product spaces In this notes we state and prove some important properties of inner product spaces. First, recall the dot product on R n : if x, y R n, say x = (x 1,...,

More information

Linear Algebra and Robot Modeling

Linear Algebra and Robot Modeling Linear Algebra and Robot Modeling Nathan Ratliff Abstract Linear algebra is fundamental to robot modeling, control, and optimization. This document reviews some of the basic kinematic equations and uses

More information

MATH Linear Algebra

MATH Linear Algebra MATH 304 - Linear Algebra In the previous note we learned an important algorithm to produce orthogonal sequences of vectors called the Gramm-Schmidt orthogonalization process. Gramm-Schmidt orthogonalization

More information

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto Reproducing Kernel Hilbert Spaces 9.520 Class 03, 15 February 2006 Andrea Caponnetto About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Novel determination of dierential-equation solutions: universal approximation method

Novel determination of dierential-equation solutions: universal approximation method Journal of Computational and Applied Mathematics 146 (2002) 443 457 www.elsevier.com/locate/cam Novel determination of dierential-equation solutions: universal approximation method Thananchai Leephakpreeda

More information

Linear Algebra (Review) Volker Tresp 2018

Linear Algebra (Review) Volker Tresp 2018 Linear Algebra (Review) Volker Tresp 2018 1 Vectors k, M, N are scalars A one-dimensional array c is a column vector. Thus in two dimensions, ( ) c1 c = c 2 c i is the i-th component of c c T = (c 1, c

More information

DS-GA 1002 Lecture notes 10 November 23, Linear models

DS-GA 1002 Lecture notes 10 November 23, Linear models DS-GA 2 Lecture notes November 23, 2 Linear functions Linear models A linear model encodes the assumption that two quantities are linearly related. Mathematically, this is characterized using linear functions.

More information

PARAMETER IDENTIFICATION IN THE FREQUENCY DOMAIN. H.T. Banks and Yun Wang. Center for Research in Scientic Computation

PARAMETER IDENTIFICATION IN THE FREQUENCY DOMAIN. H.T. Banks and Yun Wang. Center for Research in Scientic Computation PARAMETER IDENTIFICATION IN THE FREQUENCY DOMAIN H.T. Banks and Yun Wang Center for Research in Scientic Computation North Carolina State University Raleigh, NC 7695-805 Revised: March 1993 Abstract In

More information

EXERCISE SET 5.1. = (kx + kx + k, ky + ky + k ) = (kx + kx + 1, ky + ky + 1) = ((k + )x + 1, (k + )y + 1)

EXERCISE SET 5.1. = (kx + kx + k, ky + ky + k ) = (kx + kx + 1, ky + ky + 1) = ((k + )x + 1, (k + )y + 1) EXERCISE SET 5. 6. The pair (, 2) is in the set but the pair ( )(, 2) = (, 2) is not because the first component is negative; hence Axiom 6 fails. Axiom 5 also fails. 8. Axioms, 2, 3, 6, 9, and are easily

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 11, 2009 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 12, 2007 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

STRONG CONVERGENCE THEOREMS BY A HYBRID STEEPEST DESCENT METHOD FOR COUNTABLE NONEXPANSIVE MAPPINGS IN HILBERT SPACES

STRONG CONVERGENCE THEOREMS BY A HYBRID STEEPEST DESCENT METHOD FOR COUNTABLE NONEXPANSIVE MAPPINGS IN HILBERT SPACES Scientiae Mathematicae Japonicae Online, e-2008, 557 570 557 STRONG CONVERGENCE THEOREMS BY A HYBRID STEEPEST DESCENT METHOD FOR COUNTABLE NONEXPANSIVE MAPPINGS IN HILBERT SPACES SHIGERU IEMOTO AND WATARU

More information

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x = Linear Algebra Review Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1 x x = 2. x n Vectors of up to three dimensions are easy to diagram.

More information

Linear algebra II Homework #1 due Thursday, Feb A =

Linear algebra II Homework #1 due Thursday, Feb A = Homework #1 due Thursday, Feb. 1 1. Find the eigenvalues and the eigenvectors of the matrix [ ] 3 2 A =. 1 6 2. Find the eigenvalues and the eigenvectors of the matrix 3 2 2 A = 2 3 2. 2 2 1 3. The following

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 Instructions Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 The exam consists of four problems, each having multiple parts. You should attempt to solve all four problems. 1.

More information

The Gram-Schmidt Process 1

The Gram-Schmidt Process 1 The Gram-Schmidt Process In this section all vector spaces will be subspaces of some R m. Definition.. Let S = {v...v n } R m. The set S is said to be orthogonal if v v j = whenever i j. If in addition

More information

LINEAR ALGEBRA REVIEW

LINEAR ALGEBRA REVIEW LINEAR ALGEBRA REVIEW When we define a term, we put it in boldface. This is a very compressed review; please read it very carefully and be sure to ask questions on parts you aren t sure of. x 1 WedenotethesetofrealnumbersbyR.

More information

4.3 - Linear Combinations and Independence of Vectors

4.3 - Linear Combinations and Independence of Vectors - Linear Combinations and Independence of Vectors De nitions, Theorems, and Examples De nition 1 A vector v in a vector space V is called a linear combination of the vectors u 1, u,,u k in V if v can be

More information

MATH 22A: LINEAR ALGEBRA Chapter 4

MATH 22A: LINEAR ALGEBRA Chapter 4 MATH 22A: LINEAR ALGEBRA Chapter 4 Jesús De Loera, UC Davis November 30, 2012 Orthogonality and Least Squares Approximation QUESTION: Suppose Ax = b has no solution!! Then what to do? Can we find an Approximate

More information

Contents. 2.1 Vectors in R n. Linear Algebra (part 2) : Vector Spaces (by Evan Dummit, 2017, v. 2.50) 2 Vector Spaces

Contents. 2.1 Vectors in R n. Linear Algebra (part 2) : Vector Spaces (by Evan Dummit, 2017, v. 2.50) 2 Vector Spaces Linear Algebra (part 2) : Vector Spaces (by Evan Dummit, 2017, v 250) Contents 2 Vector Spaces 1 21 Vectors in R n 1 22 The Formal Denition of a Vector Space 4 23 Subspaces 6 24 Linear Combinations and

More information

Intertibility and spectrum of the multiplication operator on the space of square-summable sequences

Intertibility and spectrum of the multiplication operator on the space of square-summable sequences Intertibility and spectrum of the multiplication operator on the space of square-summable sequences Objectives Establish an invertibility criterion and calculate the spectrum of the multiplication operator

More information

Quantum logics with given centres and variable state spaces Mirko Navara 1, Pavel Ptak 2 Abstract We ask which logics with a given centre allow for en

Quantum logics with given centres and variable state spaces Mirko Navara 1, Pavel Ptak 2 Abstract We ask which logics with a given centre allow for en Quantum logics with given centres and variable state spaces Mirko Navara 1, Pavel Ptak 2 Abstract We ask which logics with a given centre allow for enlargements with an arbitrary state space. We show in

More information

New concepts: Span of a vector set, matrix column space (range) Linearly dependent set of vectors Matrix null space

New concepts: Span of a vector set, matrix column space (range) Linearly dependent set of vectors Matrix null space Lesson 6: Linear independence, matrix column space and null space New concepts: Span of a vector set, matrix column space (range) Linearly dependent set of vectors Matrix null space Two linear systems:

More information

Lecture 2: Review of Prerequisites. Table of contents

Lecture 2: Review of Prerequisites. Table of contents Math 348 Fall 217 Lecture 2: Review of Prerequisites Disclaimer. As we have a textbook, this lecture note is for guidance and supplement only. It should not be relied on when preparing for exams. In this

More information

An Integral Representation of Functions using. Three-layered Networks and Their Approximation. Bounds. Noboru Murata 1

An Integral Representation of Functions using. Three-layered Networks and Their Approximation. Bounds. Noboru Murata 1 An Integral epresentation of Functions using Three-layered Networks and Their Approximation Bounds Noboru Murata Department of Mathematical Engineering and Information Physics, University of Tokyo, Hongo

More information

Designing Information Devices and Systems II

Designing Information Devices and Systems II EECS 16B Fall 2016 Designing Information Devices and Systems II Linear Algebra Notes Introduction In this set of notes, we will derive the linear least squares equation, study the properties symmetric

More information

MTH 2032 SemesterII

MTH 2032 SemesterII MTH 202 SemesterII 2010-11 Linear Algebra Worked Examples Dr. Tony Yee Department of Mathematics and Information Technology The Hong Kong Institute of Education December 28, 2011 ii Contents Table of Contents

More information

Chapter 3: Vector Spaces x1: Basic concepts Basic idea: a vector space V is a collection of things you can add together, and multiply by scalars (= nu

Chapter 3: Vector Spaces x1: Basic concepts Basic idea: a vector space V is a collection of things you can add together, and multiply by scalars (= nu Math 314 Topics for second exam Technically, everything covered by the rst exam plus Chapter 2 x6 Determinants (Square) matrices come in two avors: invertible (all Ax = b have a solution) and noninvertible

More information

ε ε

ε ε The 8th International Conference on Computer Vision, July, Vancouver, Canada, Vol., pp. 86{9. Motion Segmentation by Subspace Separation and Model Selection Kenichi Kanatani Department of Information Technology,

More information

1. Subspaces A subset M of Hilbert space H is a subspace of it is closed under the operation of forming linear combinations;i.e.,

1. Subspaces A subset M of Hilbert space H is a subspace of it is closed under the operation of forming linear combinations;i.e., Abstract Hilbert Space Results We have learned a little about the Hilbert spaces L U and and we have at least defined H 1 U and the scale of Hilbert spaces H p U. Now we are going to develop additional

More information

Math Real Analysis II

Math Real Analysis II Math 4 - Real Analysis II Solutions to Homework due May Recall that a function f is called even if f( x) = f(x) and called odd if f( x) = f(x) for all x. We saw that these classes of functions had a particularly

More information

Bearing fault diagnosis based on EMD-KPCA and ELM

Bearing fault diagnosis based on EMD-KPCA and ELM Bearing fault diagnosis based on EMD-KPCA and ELM Zihan Chen, Hang Yuan 2 School of Reliability and Systems Engineering, Beihang University, Beijing 9, China Science and Technology on Reliability & Environmental

More information

Subspace Information Criterion for Model Selection

Subspace Information Criterion for Model Selection Neural Computation, vol.13, no.8, pp.1863 1889, 21. 1 Subspace Information Criterion for Model Selection Masashi Sugiyama Hidemitsu Ogawa Department of Computer Science, Graduate School of Information

More information

Linear Algebra, Summer 2011, pt. 3

Linear Algebra, Summer 2011, pt. 3 Linear Algebra, Summer 011, pt. 3 September 0, 011 Contents 1 Orthogonality. 1 1.1 The length of a vector....................... 1. Orthogonal vectors......................... 3 1.3 Orthogonal Subspaces.......................

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

October 25, 2013 INNER PRODUCT SPACES

October 25, 2013 INNER PRODUCT SPACES October 25, 2013 INNER PRODUCT SPACES RODICA D. COSTIN Contents 1. Inner product 2 1.1. Inner product 2 1.2. Inner product spaces 4 2. Orthogonal bases 5 2.1. Existence of an orthogonal basis 7 2.2. Orthogonal

More information

Old painting digital color restoration

Old painting digital color restoration Old painting digital color restoration Michail Pappas Ioannis Pitas Dept. of Informatics, Aristotle University of Thessaloniki GR-54643 Thessaloniki, Greece Abstract Many old paintings suffer from the

More information

Seminar on Linear Algebra

Seminar on Linear Algebra Supplement Seminar on Linear Algebra Projection, Singular Value Decomposition, Pseudoinverse Kenichi Kanatani Kyoritsu Shuppan Co., Ltd. Contents 1 Linear Space and Projection 1 1.1 Expression of Linear

More information

David Hilbert was old and partly deaf in the nineteen thirties. Yet being a diligent

David Hilbert was old and partly deaf in the nineteen thirties. Yet being a diligent Chapter 5 ddddd dddddd dddddddd ddddddd dddddddd ddddddd Hilbert Space The Euclidean norm is special among all norms defined in R n for being induced by the Euclidean inner product (the dot product). A

More information

TORWARDS A GENERAL FORMULATION FOR OVER-SAMPLING AND UNDER-SAMPLING

TORWARDS A GENERAL FORMULATION FOR OVER-SAMPLING AND UNDER-SAMPLING TORWARDS A GEERAL FORMULATIO FOR OVER-SAMPLIG AD UDER-SAMPLIG Aira Hirabayashi 1 and Laurent Condat 2 1 Dept. of Information Science and Engineering, Yamaguchi University, 2-16-1, Toiwadai, Ube 755-8611,

More information

Gaussian Process Regression: Active Data Selection and Test Point. Rejection. Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer

Gaussian Process Regression: Active Data Selection and Test Point. Rejection. Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer Gaussian Process Regression: Active Data Selection and Test Point Rejection Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer Department of Computer Science, Technical University of Berlin Franklinstr.8,

More information

LECTURE 7. k=1 (, v k)u k. Moreover r

LECTURE 7. k=1 (, v k)u k. Moreover r LECTURE 7 Finite rank operators Definition. T is said to be of rank r (r < ) if dim T(H) = r. The class of operators of rank r is denoted by K r and K := r K r. Theorem 1. T K r iff T K r. Proof. Let T

More information

THE PROBLEMS OF ROBUST LPC PARAMETRIZATION FOR. Petr Pollak & Pavel Sovka. Czech Technical University of Prague

THE PROBLEMS OF ROBUST LPC PARAMETRIZATION FOR. Petr Pollak & Pavel Sovka. Czech Technical University of Prague THE PROBLEMS OF ROBUST LPC PARAMETRIZATION FOR SPEECH CODING Petr Polla & Pavel Sova Czech Technical University of Prague CVUT FEL K, 66 7 Praha 6, Czech Republic E-mail: polla@noel.feld.cvut.cz Abstract

More information

Math Linear Algebra II. 1. Inner Products and Norms

Math Linear Algebra II. 1. Inner Products and Norms Math 342 - Linear Algebra II Notes 1. Inner Products and Norms One knows from a basic introduction to vectors in R n Math 254 at OSU) that the length of a vector x = x 1 x 2... x n ) T R n, denoted x,

More information

5 and A,1 = B = is obtained by interchanging the rst two rows of A. Write down the inverse of B.

5 and A,1 = B = is obtained by interchanging the rst two rows of A. Write down the inverse of B. EE { QUESTION LIST EE KUMAR Spring (we will use the abbreviation QL to refer to problems on this list the list includes questions from prior midterm and nal exams) VECTORS AND MATRICES. Pages - of the

More information

which arises when we compute the orthogonal projection of a vector y in a subspace with an orthogonal basis. Hence assume that P y = A ij = x j, x i

which arises when we compute the orthogonal projection of a vector y in a subspace with an orthogonal basis. Hence assume that P y = A ij = x j, x i MODULE 6 Topics: Gram-Schmidt orthogonalization process We begin by observing that if the vectors {x j } N are mutually orthogonal in an inner product space V then they are necessarily linearly independent.

More information

Ph 219/CS 219. Exercises Due: Friday 20 October 2006

Ph 219/CS 219. Exercises Due: Friday 20 October 2006 1 Ph 219/CS 219 Exercises Due: Friday 20 October 2006 1.1 How far apart are two quantum states? Consider two quantum states described by density operators ρ and ρ in an N-dimensional Hilbert space, and

More information

Linear Algebra (Review) Volker Tresp 2017

Linear Algebra (Review) Volker Tresp 2017 Linear Algebra (Review) Volker Tresp 2017 1 Vectors k is a scalar (a number) c is a column vector. Thus in two dimensions, c = ( c1 c 2 ) (Advanced: More precisely, a vector is defined in a vector space.

More information

Hilbert Spaces: Infinite-Dimensional Vector Spaces

Hilbert Spaces: Infinite-Dimensional Vector Spaces Hilbert Spaces: Infinite-Dimensional Vector Spaces PHYS 500 - Southern Illinois University October 27, 2016 PHYS 500 - Southern Illinois University Hilbert Spaces: Infinite-Dimensional Vector Spaces October

More information

1. The Polar Decomposition

1. The Polar Decomposition A PERSONAL INTERVIEW WITH THE SINGULAR VALUE DECOMPOSITION MATAN GAVISH Part. Theory. The Polar Decomposition In what follows, F denotes either R or C. The vector space F n is an inner product space with

More information

Kernels for Multi task Learning

Kernels for Multi task Learning Kernels for Multi task Learning Charles A Micchelli Department of Mathematics and Statistics State University of New York, The University at Albany 1400 Washington Avenue, Albany, NY, 12222, USA Massimiliano

More information

Exercises * on Linear Algebra

Exercises * on Linear Algebra Exercises * on Linear Algebra Laurenz Wiskott Institut für Neuroinformatik Ruhr-Universität Bochum, Germany, EU 4 February 7 Contents Vector spaces 4. Definition...............................................

More information

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = 30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can

More information

Optimum Sampling Vectors for Wiener Filter Noise Reduction

Optimum Sampling Vectors for Wiener Filter Noise Reduction 58 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 50, NO. 1, JANUARY 2002 Optimum Sampling Vectors for Wiener Filter Noise Reduction Yukihiko Yamashita, Member, IEEE Absact Sampling is a very important and

More information

General Inner Product & Fourier Series

General Inner Product & Fourier Series General Inner Products 1 General Inner Product & Fourier Series Advanced Topics in Linear Algebra, Spring 2014 Cameron Braithwaite 1 General Inner Product The inner product is an algebraic operation that

More information

Part 1a: Inner product, Orthogonality, Vector/Matrix norm

Part 1a: Inner product, Orthogonality, Vector/Matrix norm Part 1a: Inner product, Orthogonality, Vector/Matrix norm September 19, 2018 Numerical Linear Algebra Part 1a September 19, 2018 1 / 16 1. Inner product on a linear space V over the number field F A map,

More information

Math113: Linear Algebra. Beifang Chen

Math113: Linear Algebra. Beifang Chen Math3: Linear Algebra Beifang Chen Spring 26 Contents Systems of Linear Equations 3 Systems of Linear Equations 3 Linear Systems 3 2 Geometric Interpretation 3 3 Matrices of Linear Systems 4 4 Elementary

More information

Basic Elements of Linear Algebra

Basic Elements of Linear Algebra A Basic Review of Linear Algebra Nick West nickwest@stanfordedu September 16, 2010 Part I Basic Elements of Linear Algebra Although the subject of linear algebra is much broader than just vectors and matrices,

More information

(v, w) = arccos( < v, w >

(v, w) = arccos( < v, w > MA322 Sathaye Notes on Inner Products Notes on Chapter 6 Inner product. Given a real vector space V, an inner product is defined to be a bilinear map F : V V R such that the following holds: For all v

More information

ADJOINTS, ABSOLUTE VALUES AND POLAR DECOMPOSITIONS

ADJOINTS, ABSOLUTE VALUES AND POLAR DECOMPOSITIONS J. OPERATOR THEORY 44(2000), 243 254 c Copyright by Theta, 2000 ADJOINTS, ABSOLUTE VALUES AND POLAR DECOMPOSITIONS DOUGLAS BRIDGES, FRED RICHMAN and PETER SCHUSTER Communicated by William B. Arveson Abstract.

More information

= w 2. w 1. B j. A j. C + j1j2

= w 2. w 1. B j. A j. C + j1j2 Local Minima and Plateaus in Multilayer Neural Networks Kenji Fukumizu and Shun-ichi Amari Brain Science Institute, RIKEN Hirosawa 2-, Wako, Saitama 35-098, Japan E-mail: ffuku, amarig@brain.riken.go.jp

More information

Lecture 4 February 2

Lecture 4 February 2 4-1 EECS 281B / STAT 241B: Advanced Topics in Statistical Learning Spring 29 Lecture 4 February 2 Lecturer: Martin Wainwright Scribe: Luqman Hodgkinson Note: These lecture notes are still rough, and have

More information

using the Hamiltonian constellations from the packing theory, i.e., the optimal sphere packing points. However, in [11] it is shown that the upper bou

using the Hamiltonian constellations from the packing theory, i.e., the optimal sphere packing points. However, in [11] it is shown that the upper bou Some 2 2 Unitary Space-Time Codes from Sphere Packing Theory with Optimal Diversity Product of Code Size 6 Haiquan Wang Genyuan Wang Xiang-Gen Xia Abstract In this correspondence, we propose some new designs

More information

1 Linear Algebra Problems

1 Linear Algebra Problems Linear Algebra Problems. Let A be the conjugate transpose of the complex matrix A; i.e., A = A t : A is said to be Hermitian if A = A; real symmetric if A is real and A t = A; skew-hermitian if A = A and

More information