Reproducing Kernel Hilbert Spaces for Penalized Regression: A tutorial

Size: px
Start display at page:

Download "Reproducing Kernel Hilbert Spaces for Penalized Regression: A tutorial"

Transcription

1 Reproducing Kernel Hilbert Spaces for Penalized Regression: A tutorial Alvaro Nosedal-Sanchez a, Curtis B. Storlie b, Thomas C.M. Lee c, Ronald Christensen d a Indiana University of Pennsylvania b Los Alamos National Laboratory c University of California, Davis d University of New Mexico December 8, 21 Abstract Penalized Regression procedures have become very popular ways to estimate complex functions. The smoothing spline, for example, is the solution of a minimization problem in a functional space. If such a minimization problem is posed on a reproducing kernel Hilbert space (RKHS), the solution is guaranteed to exist, is unique and has a very simple form. There are excellent books and articles about RKHS and their applications in statistics, however, this existing literature is very dense. Our intention is to provide a friendly reference for a reader approaching this subject for the first time. This paper begins with a simple problem, a system of linear equations, then gives an intuitive motivation for reproducing kernels. Armed with the intuition gained from our first examples, we take the reader from Vector Spaces, to Banach Spaces and to RKHS. Finally, we present some statistical estimation problems that can be solved using the mathematical machinery discussed. After reading this tutorial, the reader will be ready to study more advanced texts and articles about the subject such as Wahba (199) or Gu (22). 1 Introduction Penalized regression procedures have become a very popular approach to estimating complex functions (Wahba 199, Eubank 1999, Hastie, Tibshirani & Friedman 21). These procedures use an estimator that is defined as the solution to a minimization problem. In any minimization problem, there are the following questions: Does the solution exist? If yes, is the solution unique? How can we find it? If the problem is posed in the Reproducing Kernel Hilbert Space (RKHS) framework that we discuss below, then the solution is guaranteed to exist, it is unique and it takes a particularly simple form. Reproducing Kernel Hilbert Spaces and Reproducing Kernels play a central role in Penalized Regression. In the first section of this tutorial we present, and solve, simple problems while gently introducing key concepts. The remainder of the paper is organized as follows. Section 2 takes the reader from a basic understanding of fields through Banach Spaces and Hilbert Spaces. In Section 3, 1

2 we provide elementary theory of RKHS s along with some examples. Section 4 discusses Penalized Regression with RKHS s. Two specific examples involving ridge regression and smoothing splines are given with R code to solidify the concepts. Some methods for smoothing parameter selection are briefly mentioned at the end. Section 5 contains some closing remarks. 1.1 Why should we care about Reproducing Kernel Hilbert Spaces? Before introducing new concepts, we present some simple illustrations of the tools used to solve problems in the RKHS framework. Consider solving the system of linear equations x 1 + x 3 = (1) x 2 = 1. (2) Clearly the real-valued solutions to this system are the vectors x t = ( α, 1, α) for α R. Suppose we want to find the smallest solution. Under the usual squared norm x 2 = x x2 2 + x2 3, the smallest solution is x t s = (, 1, ). Now consider a more general problem. For a given p n matrix R and n 1 matrix η, solve R t x = η, (3) where R t is the transpose of R, x and the columns of R, say R k, k = 1, 2,..., n, are all in R p, and η R n. We wish to find the solution x s that minimizes the norm x = x t x. We solve the problem using concepts that extend to RKHS. A solution x (not necessarily a minimum norm solution) exists whenever η C(R t ). Here C(R t ) denotes the column space of R t. Given one solution x, all solutions x must satisfy R t x = R t x or R t (x x) =. The vector x can be written uniquely as x = x + x 1 with x C(R) and x 1 C(R), where C(R) is the orthogonal complement of C(R). Clearly, x is a solution because R t (x x ) = R t x 1 = In fact, x is both the unique solution in C(R) and the minimum norm solution. If x is any other solution in C(R) then R t (x x ) = so we have both (x x ) C(R) and (x x ) C(R), two sets whose intersection is only the vector. Thus x x = and x = x. In other words, every solution x has the same x vector. Finally, x is also the minimum norm solution because the arbitrary solution x has x t x x t x + x t 1x 1 x t x. We have established the existence of a unique, minimum norm solution in C(R) that can be written as x s x = Rξ = 2 ξ k R k, (4) k=1

3 for some ξ k, k = 1,..., n. To find x s explicitly, write x s = Rξ and the defining equation (3) becomes R t Rξ = η, (5) which is just a system of linear equations. Even if there exist multiple solutions ξ, Rξ is unique. Now we use this framework to find the smallest solution to the system of equations (1) and (2). In the general framework we have x t = (x 1, x 2, x 3 ), η t = (, 1), R t 1 = (1,, 1), R t 2 = (, 1, ). We know that the solution has the form (4) and we also know that we have to solve a system of equations given by (5). In this case, the system of equations is 2ξ 1 + ξ 2 =, ξ 1 + 1ξ 2 = 1. The solution to the system is (ξ 1, ξ 2 ) = (, 1) which implies that our solution to the original problem is x s = R 1 + 1R 2 = (, 1, ) t as expected. Virtually the same methods can be used to solve a similar problem in any inner-product space Ω. As discussed later, an inner product, assigns real numbers to pairs of vectors. For given vectors R k Ω and numbers η k R, find x Ω such that R k, x = η k, k = 1, 2,..., n (6) for which the norm of x x, x is minimal. The solution has the form x s = with ξ k satisfying the linear equations ξ k R k, (7) k=1 R i, R k ξ i = η i, i = 1,..., n. k=1 For a formal proof see Máté (199). In RKHS applications, vectors are typically functions. We now apply this result to the interpolating spline problem. Interpolating Splines. Suppose we want to find a function f(t) that interpolates between the points (t k, η k ), k =, 1, 2,..., n, where η and = t < t 1 < < t n = 1. We restrict attention to functions f F where F ={f: f is absolutely continuous on [,1], f() =, f L 2 [, 1] }. 3

4 Throughout f (m) denotes the m-th derivative of f with f f (1) and f f (2). The restriction that η = f() = is not really necessary, but simplifies the presentation. We want to find the smoothest function f(t) that satisfies f(t k ) = η k, k = 1,..., n. Defining an inner product on F by f, g = 1 f (x)g (x)dx implies a norm over the space F that is small for smooth functions. To address the interpolation problem, note that the functions R k (s) min(s, t k ), k = 1, 2,..., n have R k () = and the property that R k, f = f(t k ) because f, R k = = = 1 tk tk f (s)r k (s)ds 1 f (s)1ds + f (s)ds t k f (s)ds = f(t k ) f() = f(t k ). Thus, an interpolator f satisfies a system of equations like (6), namely f(t k ) = R k, f = η k, k = 1,..., n. (8) and by (7), the smoothest function f (minimum norm) that satisfies the requirements has the form ˆf(t) = ξ k R k (t) k=1 The ξ j s are the solutions to the system of real linear equations obtained by substituting ˆf into (8), R k, R j ξ j = η k, k = 1, 2,..., n. j=1 Note that and define the function R k, R j = R j (t k ) = R k (t j ) = min(t k, t j ) R(s, t) = min(s, t) which turns out to be a reproducing kernel. Numerical example. Given points f(t i ) = η i, say, f() =, f(.1) =.1, f(.25) = 1, f(.5) = 2, f(.75) = 1.5, and f(1) = 1.75, we find arg min f F f 2 = 1 f (x) 2 dx. 4

5 The system of equations is.1ξ 1 +.1ξ 2 +.1ξ 3 +.1ξ 4 +.1ξ 5 =.1.1ξ ξ ξ ξ ξ 5 = 1.1ξ ξ 2 +.5ξ 3 +.5ξ 4 +.5ξ 5 = 2.1ξ ξ 2 +.5ξ ξ ξ 5 = 1.5.1ξ ξ 2 +.5ξ ξ 4 + ξ 5 = The solution is ξ = ( 5, 2, 6, 3, 1) t, which implies that our function is ˆf(t) = 5R 1 (t) + 2R 2 (t) + 6R 3 (t) 3R 4 (t) + 1R 5 (t) (9) = 5R(t, t 1 ) + 2R(t, t 2 ) + 6R(t, t 3 ) 3R(t, t 4 ) + 1R(t, t 5 ) or, adding the slopes for t > t i and finding the intercepts, t t.1 6t.5.1 t.25 ˆf(t) = 4t.25 t.5 2t t.75 t t 1 This the linear interpolating spline as can be seen graphically in Figure A.2. For this illustration we restricted f so that f() =. This was only for convenience of presentation. It can be shown that the form of the solution remains the same with any shift to the function, so that in general the solution takes the form ˆf(t) = ξ + n j=1 ξ jr j (t) where ξ = η. The key points are (i) the elements R i that allow us to express a function evaluated at a point as an inner-product constraint, and (ii) the restriction to functions in F. F is a very special function space, a reproducing kernel Hilbert space, and R i is determined by a reproducing kernel R. Ultimately, our goal is to address more complicated regression problems like the Linear Smoothing Spline Problem. Consider simple regression data (x i, y i ), i = 1,..., n and finding the function that minimizes 1 n {y i f(x i )} 2 + λ 1 f (x) 2 dx. (1) If f(x) is restricted to be in some class of functions F, minimizing only the first term gives least squares estimation within F. If F contains functions with f(x i ) = y i for all i, such functions minimize the first term but are typically very unsmooth, i.e., have large second term. second penalty term is minimized by having a horizontal line, but that rarely has a small first term. As we will see in Section 4, for suitable F the minimizer takes the form ˆf(x) = ξ + ξ k R k (x), i=k The 5

6 where the R k s are known functions and the ξ k s are coefficients found by solving a system of linear equations. This produces a linear smoothing spline. If our goal is only to derive the solution to the linear smoothing spline problem with one predictor variable, RKHS theory is overkill. The value of RKHS theory lies in its generality. The linear spline penalty can be replaced by any other penalty with an associated inner product, and the x i s can be vectors in R p. Using RKHS results, we can solve the general problem of finding the minimizer of 1 n n (y i f(x i )) 2 + λj(f) for a general functional J that corresponds to a squared norm in a subspace. See Wahba (199) or Gu (22) for a full treatment of this approach. We now present an introduction to this theory. 2 Vector, Banach and Hilbert Spaces This section presents background material required for the formal development of the RKHS framework. 2.1 Vector Spaces A vector space is a set that contains elements called vectors and supports two kinds of operations: addition of vectors and multiplication by scalars. The scalars are drawn from some field (the real numbers in the rest of this article) and the vector space is said to be a vector space over that field. Formally, a set V is a vector space over a field F if there exists a structure {V, F, +,, v } consisting of V, F, a vector addition operation +, a scalar multiplication, and an identity element v V. This structure must obey the following axioms for any u, v, w V and a, b F : Associative Law: (u + v) + w = u + (v + w). Commutative Law: u + v = v + u. Inverse Law: s V s.t. u + s = v. (Write u s.) Identity Laws: v + u = u. 1 u = u. Distributive Laws: a (b u) = (a b) u. (a + b) u = a u + b u. a (u + v) = a u + a v. We will write for v V and u + ( v) as u v. Any subset of a vector space that is closed under vector addition and scalar multiplication is call a subspace. The simplest example of a vector space is just R itself, which is a vector space over R. Vector addition and scalar multiplication are just addition and multiplication on R. For more on vector spaces and the other topics to follow in this section, see, for example, Naylor & Sell (1982), Young (1988), Máté (199) and Rustagi (1994). 6

7 2.2 Banach Spaces Definition. A norm of a vector space V, denoted by, is a nonnegative real valued function satisfying the following properties for all u, v V and all a R, 1. Non-negative: u 2. Strictly positive: u = implies u = 3. Homogeneous: au = a u 4. Triangle inequality: u + v u + v Definition. A vector space is called a normed vector space when a norm is defined on the space. Definition. A sequence {v n } in a normed vector space V is said to converge to v V if lim v n v = n Definition. A sequence {v n } V is called a Cauchy sequence if for any given ϵ >, there exists an integer N such that v m v n < ϵ, whenever m, n N. Convergence of sequences in normed vector spaces follows the same general idea as sequences of real numbers except that the distance between two elements of the space is measured by the norm of the difference between the two elements. Definition (Banach Space). A normed vector space V is called complete if every Cauchy sequence in V converges to an element of V. A complete normed vector space is called a Banach Space. Example 2.1. R with the absolute value norm x x is a complete, normed vector space over R, and is thus a Banach space. Example 2.2. Let x = (x 1,..., x n ) t be a point in R n. The l p norm on R n is defined by [ x p = x i p ] 1/p for 1 p <. One can verify properties 1-4 at the beginning of this section for each p, validating that x p is a norm on R n. 2.3 Hilbert Spaces A Hilbert Space is a Banach space in which the norm is defined by an inner-product (also called dotproduct) which we define below. We typically denote Hilbert spaces by H. For elements u, v H, write the inner product of u and v either as u, v H or, when it is clear by context that the inner product is taking place in H, as u, v. If H is a vector space over F, the result of the inner product is an element in F. We have F = R, so the result of an inner product will be a real number. The inner product operation must satisfy four properties for all u, v, w H and all a F. 7

8 1. Associative: au, v = a u, v. 2. Commutative: u, v = v, u. 3. Distributive: u, v + w = u, v + u, w. 4. Positive Definite: u, u with equality holding only if u =. Definition. A vector space with an inner product defined on it is called an inner-product space. The norm of an element u in an inner-product space is taken as u = u, u 1/2. Two vectors are said to be orthogonal if their inner product is and two sets of vectors are said to be orthogonal if every vector in one is orthogonal to every vector in the other. The set of all vectors orthogonal to a subspace is called the orthogonal complement of the subspace. A complete inner-product space is called a Hilbert space. Example 2.3. R n with inner product defined by u, v u t v = u i v i is a Hilbert space. For any positive definite matrix A, u, v A u t Av also defines a valid inner product. Example 2.4. Let L 2 (a, b) be the vector space of all functions defined on the interval (a, b) that are square integrable and define the inner product f, g b a f(x)g(x)dx. The inner-product space L 2 (a, b) is well-known to be complete; see de Barra (1981), thus L 2 (a, b) is a Hilbert space. 3 Reproducing Kernel Hilbert Spaces (RKHS s) Hilbert spaces that display certain properties on certain linear operators are reproducing kernel Hilbert spaces. Definition. A function T mapping a vector space X into another vector space Y is called a linear operator if T (λ 1 x 1 + λ 2 x 2 ) = λ 1 T (x 1 ) + λ 2 T (x 2 ) for any x 1, x 2 X and any λ 1, λ 2 R. Any m n matrix A maps vectors in R n into vectors in R m via Ax = y and is linear. Definition. The operator T : X Y mapping a Banach space into a Banach space is continuous at x X if and only if for every ϵ > there exists δ = δ(ϵ) > such that for every x with x x < δ we have T x T x < ϵ. Linear operators are continuous everywhere if they are continuous at. Definition. A real valued function defined on a vector space is called a functional. 8

9 A 1 n matrix defines a linear functional on R n. Example 3.1. Let S be the set of bounded real valued continuous functions {f(x)} defined on the real line. Then S is a vector space with the usual + and operations for functions. Some functionals on S are ϕ(f) = b a f(x)dx and ϕ a(f) = f (a) for some fixed a. A functional of particular importance is the evaluation functional. Definition. Let V be a vector space of functions defined from E into R. For any t E, denote by e t the evaluation functional at the point t, i.e., for g V, the mapping is e t (g) = g(t). Clearly, evaluation functionals are linear operators. In a Hilbert space (or any normed vector space) of functions, the notion of pointwise convergence is related to the continuity of the evaluation functionals. The following are equivalent for a normed vector space H of real valued functions. (i) The evaluation functionals are continuous for all t E. (ii) If f n, f H and f n f then f n (t) f(t) for every t E. (iii) For every t E there exists K t > such that f(t) K t f for all f H. Here (ii) is the definition of (i). See Máté (199) for a proof of (iii). To define a reproducing kernel, we need the famous Riesz Representation Theorem. Theorem. Let H be a Hilbert space and let ϕ be a continuous linear functional on H. Then there is one and only one vector g H such that ϕ(f) = f, g, for all f H. The vector g is sometimes called the representation of ϕ. However, ϕ and g are different objects: ϕ is a linear functional on H and g is a vector in H. For a proof of this theorem see Naylor & Sell (1982) or Máté (199). For H = R p, vectors can be viewed as functions from the set E = {1, 2,..., p} into R. An evaluation functional is e i (x) = x i. The representation of this linear functional is the indicator vector e i that is everywhere except has a 1 in the ith place. Then x i = e i (x) = x t e i. In fact, the entire representation theorem is well known in R p because for ϕ(x) to be a linear functional there must exist a vector ϕ such that ϕ(x) = ϕ t x. An element of a set of functions, say f, is sometimes denoted f( ) to be explicit that the elements are functions, whereas f(t) is the value of f( ) evaluated at t E. Applying the Riesz 9

10 Representation Theorem to a Hilbert space H of real valued functions in which evaluation functionals are continuous, for every t E there is a unique symmetric function R : E E R with R(, t) H the representation of e t, so that f(t) = e t (f) = f( ), R(, t) H, f H. The function R is called a reproducing kernel (r.k.) and f(t) = f( ), R(, t) is called the reproducing property of R. In particular, by the reproducing property R(s, t) = R(, t), R(, s). In Section 1 we found the r.k. for an interpolating spline problem. shortly. Other examples follow Definition A Hilbert space H of functions defined on E is called a reproducing kernel Hilbert space if all evaluation functionals are continuous. The projection principle for a RKHS. Consider the connection between the reproducing kernel R of the RKHS H and the reproducing kernel R for a subspace H H. Suppose H is an RKHS with r.k. R and H is a closed subset of H and let H be the orthogonal complement of H. Then any vector f H can be written uniquely as f = f + f 1 with f H and f 1 H. More particularly, R(, t) = R (, t) + R 1 (, t) with R (, t) H and R 1 (, t) H if and only if R is the r.k. of H and R 1 is the r.k. of H. For a proof see Gu (22). 3.1 Examples of Reproducing Kernel Hilbert Spaces Example 3.2. Consider the space of all constant functionals over x = (x 1, x 2,..., x p ) t R p, H = {f θ : f θ (x) = θ, θ R}, with f θ, f λ = θλ. (For simplicity, think of p = 1.) Since R p is a Hilbert Space, so is H. H has continuous evaluation functionals, so it is an RKHS and has a unique reproducing kernel. To find the r.k., observe that R(, x) H, so it is a constant for any x. Write R(x) R(, x). By the representation theorem and the defined inner product θ = f θ (x) = f θ ( ), R(, x) = θr(x) for any x and θ. This implies that R(x) 1 so that R(, x) = R(x) 1 and R(, ) 1. Example 3.3. Consider all linear functionals over x R p passing through the origin H = {f θ : f θ (x) = θ t x, θ R p }. Define f θ, f λ = θ t λ = θ 1 λ 1 + θ 2 λ θ p λ p. The kernel R must satisfy f θ (x) = f θ ( ), R(, x) 1

11 for all θ and any x. Since R(, x) H, R(v, x) = u t v for some u that depends on x, i.e., R(, x) = f u(x) ( ), so R(v, x) = u(x) t v. By our definition of H we have θ t x = f θ (x) = f θ ( ), R(, x) = f θ ( ), f u(x) ( ) = θ t u(x), so we need u(x) such that for any θ and x we have θ t x = θ t u(x). It follows that u(x) = x. For example, taking θ to be the indicator vector e i implies that u i (x) = x i for every i = 1,..., p. We now have R(, x) = fx( ) so that R( x, x) = x t x = x 1 x 1 + x 2 x x p x p. Example 3.4. Now consider all affine functionals in R p, H = {f θ : f θ (x) = θ + θ 1 x θ p x p, θ R p+1 }, with f θ, f λ = θ λ + θ 1 λ θ p λ p. The subspace H = {f θ H : θ R, = θ 1 = = θ p } has the orthogonal complement H = {f θ H : = θ }. For practical purposes, H is the space of constant functionals from Example 3.2 and H is the space of linear functionals from Example 3.3. Note that the inner product on H when applied to vectors in H and H, respectively, reduces to the inner products used in Examples 3.2 and 3.3. Write H as H = H H where denotes the direct sum of two vector spaces. For two subspaces A and B contained in a vector space C, the direct sum is the space D = {a + b : a A, b B}. Any elements d 1, d 2 D can be written as a 1 + b 1 and a 2 + b 2, respectivley for some a 1, a 2 A and b 1, b 2 B. When the two subspaces are orthogonal, as in our example, those decompositions are unique and the inner product between d 1 and d 2 is d 1, d 2 = a 1, a 2 + b 1, b 2. For more information about direct sum decomposition, see Berlinet & Thomas-Agnan (24) or Gu (22), for example. We have already derived the r.k. s for H and H (call them R and R 1, respectively) in Examples 3.2 and 3.3. Applying the projection principle, the r.k. for H is the sum of R and R 1, i.e., Example 3.5. subspace R( x, x) = 1 + x t x. Denote by V the collection of functions f with f L 2 [, 1] and consider the W 2 = {f(x) V : f, f absolutely continuous and f() = f () = }. Define an inner product on W 2 as f, g = 1 f (t)g (t)dt. (11) 11

12 Below we demonstrate that for f W2 and any s, f(s) can be written as f(s) = 1 (s u) + f (u)du, (12) where (a) + is a for a > and for a. Given any arbitrary and fixed s [, 1], Integrating by parts 1 (s u) + f (u)du = s (s u)f (u)du. s (s u)f (u)du = (s s)f (s) (s )f () + and applying the Fundamental Theorem of Calculus to the last term, s Since the r.k. of the space W2 R(, s) is a function such that s (s u)f (u)du = f(s) f() = f(s). f (u)du = s f (u)du must satisfy f(s) = f( ), R(, s) from (11) and (12) we see that d 2 R(u, s) d 2 u = (s u) +. We also know that R(, s) W2, so using R(s, t) = R(, t), R(, s) R(s, t) = 1 (t u) + (s u) + du = max(s, t) min2 (s, t) 2 min3 (s, t). 6 For further examples of RKHSs with various inner products, see Berlinet & Thomas-Agnan (24). 4 Penalized Regression with RKHSs Nonparametric regression is a powerful approach for solving many modern problems such as computer modeling, image processing, environmental monitoring, to name a few. The nonparametric regression model is given by y i = f(x i ) + ϵ i, i = 1, 2,..., n, where f is an unknown regression function and the ϵ i are independent error terms. We start this section with two common examples of penalized regression: ridge regression and smoothing splines. Ridge Regression. In the classical linear regression setting y i = x t i β + ϵ i the ridge regression estimator ˆβ R proposed by Hoerl & Kennard (197a) minimizes min β y i β p β j x ij j=1 2 + λ p βj 2 (13) where x ij is the ith observation of the jth component. The resulting estimate is biased but can reduce the variance relative to least squares estimates. The tuning parameter λ is a constant 12 j=1

13 that controls the trade-off between bias and variance in ˆβ R, and is often selected by some form of cross validation; see Section 4.4. Smoothing Splines. Smoothing splines are among the most popular methods for the estimation of f, due to their good empirical performance and sound theoretical support. It is often assumed, without loss of generality, that the domain of f is [, 1]. With f (m) the m-th derivative of f, a smoothing spline estimate ˆf is the unique minimizer of [ 2 {y i f(x i )} 2 + λ f (x)] (m) dx (14) The minimization of (14) is implicitly over functions with square integrable m-th derivatives. The first term of (14) encourages the fitted f to be close to the data, while the second term penalizes the roughness of f. The smoothing parameter λ, usually pre-specified, controls the tradeoff between the two conflicting goals. The special case of m = 1 reduces to the linear smoothing spline problem from (1). In practice it is common to choose m = 2 in which case the minimizer f λ of (14) is called a cubic smoothing spline. As λ, ˆfλ approaches the least squares simple linear regression line, while as λ, ˆf λ approaches the minimum curvature interpolant. 4.1 Solving the General Penalized Regression Problem We now review a general framework to minimize (13), (14) and many other similar problems, cf. (O Sullivan, Yandell & Raynor 1986, Lin & Zhang 26, Storlie, Bondell, Reich & Zhang 21, Storlie, Bondell & Reich 21, Gu & Qiu 1993). The data model is y i = f(x i ) + ϵ i, i = 1, 2,..., n, (15) where the ϵ i are error terms and f V, a given vector space of functions on a set E. An estimate of f is obtained by minimizing 1 n {y i f(x i )} 2 + λj(f), (16) over f V where J is a penalty functional that must satisfy several restrictions and that helps to define the vector space V. We require (i) that J(f) for any f in some vector space Ṽ, (ii) that the null set N = {f Ṽ : J(f) = } be a subspace, i.e., that it be closed under vector addition and scalar multiplication, (iii) that for f N N and f Ṽ, we have J(f N + f) = J(f), and (iv) that there exists an RKHS H contained in Ṽ for which the inner product satisfies f, f = J(f). This condition forces the intersection of N and H to contain only the zero vector. Define a finite dimensional subspace of N, say N, with a basis of known functions, say {ϕ 1,..., ϕ M }, and M n. In applications N is often finite dimensional, and we can simply take N = N. In the minimization problem, we restrict attention to f V where V N H. 13

14 For Example 3.5, Ṽ consists of functions in L2 [, 1] with finite values of J(f) 1 [ f (t) ] 2 dt. J(f) satisfies our four conditions with H = W2. The linear functions f(x) = a + bx are in N so we can take ϕ 1 (x) 1 and ϕ 2 (x) = x. Note that equation (11) defines an inner product on W2 but does not define an inner product on all of Ṽ because nonzero functions can have a zero inner product with themselves, hence the nontrivial nature of N. The key result is that the minimizer of (16) is a linear combination of known functions involving the reproducing kernel on H. This fact will allow us to find the coefficients of the linear combination by solving a quadratic minimization problem similar to those in standard linear models. Representation Theorem. The minimizer ˆf λ of equation (16) has the form ˆf λ (x) = M d j ϕ j (x) + j=1 c i R(x i, x), (17) where R(s, t) is the r.k. for H. An informal proof is given below, see Wahba (199) or Gu (22) for a formal proof. Since we are working in V, clearly, a minimizer ˆf must have ˆf = ˆf + ˆf 1 with ˆf N and ˆf 1 H. We want to establish that ˆf 1 ( ) = n c ir(x i, ). To simplify notation write ˆf R ( ) c i R(x i, ). (18) Decompose H as H = H H where H = span{r(x i, ), i = 1,..., n} so that ˆf 1 ( ) = ˆf R ( ) + η( ), with η( ) H. By orthogonality and the reproducing property of the r.k., = R(x i, ), η( ) = η(x i ). We now establish the representation theorem. Using our assumptions about J, 1 {y i n ˆf(x i )} 2 + λj( ˆf) = 1 {y i n ˆf (x i ) ˆf 1 (x i )} 2 + λj( ˆf + ˆf 1 ) = 1 {y i n ˆf (x i ) ˆf 1 (x i )} 2 + λj( ˆf 1 ) = 1 n {y i ˆf (x i ) ˆf R (x i ) η(x i )} 2 + λj( ˆf R + η). Because η(x i ) = and using orthogonality within H 1 {y i n ˆf(x i )} 2 + λj( ˆf) = 1 {y i n ˆf (x i ) ˆf [ R (x i )} 2 + λ J( ˆf ] R ) + J(η) 1 {y i n ˆf (x i ) ˆf R (x i )} 2 + λj( ˆf R ). 14

15 Clearly, any η makes the inequality strict, so minimizers have η = and ˆf = ˆf + ˆf R with the last inequality an equality. Once we know that the minimizer takes the form (17), we can find the coefficients of the linear combination by solving a quadratic minimization problem similar to those in standard linear models. This occurs because we can write J( ˆf) = J( ˆf R ) as a quadratic form in c = (c 1,..., c n ) t. Define Σ as the n n matrix where the i, j entry is Σ ij = R(x i, x j ). Now, using the reproducing property of R, write J( ˆf R ) = c i R(x i, ), c j R(x j, ) = j=1 j=1 c i c j R(x i, x j ) = c t Σc. Define the observation vector y = [y 1,..., y n ] t, and let T be the n M matrix with the ij-th entry defined by T ij = {ϕ j (x i )}. The minimization of (16) then takes the form and min c,d To solve (19) we define the following matrices, Now (19) becomes 1 n y (Td + Σc) 2 + λc t Σc. (19) Q n (n+m) = [ T n M Σ n n ], γ (n+m) 1 = S (n+m) (n+m) = min γ [ ] dm 1, c n 1 [ ] M M M n. n 1 Σ n n 1 n y Qγ 2 + λγ t Sγ. (2) The minimization in (2) is the same as a generalized ridge regression. Taking derivatives with respect to γ we have (Q t Q + λs)ˆγ = Q t y, which requires solving a system of n+m equations to find ˆγ. For analytical purposes, we can write ˆγ as ˆγ = (Q t Q + λs) Q t y. (21) As long as T is of full column rank, then ˆf λ is unique. If the x i are not unique, then ˆγ is not unique as defined, but ˆf λ is still unique. One could simply use only the unique x i in the definition of f R in (18) to ensure that ˆγ is unique in that case as well. model Alternatively, ˆγ can be obtained as the generalized least squares estimate from fitting the linear [ y ] [ = ] [ T Σ d n M I n n c ] [ In n + e, Cov(e) n n n n Σ 1 n n ]. 15

16 For clarity, we have restricted our attention to minimizing (16), which incorporates squared error loss between the observations and the unknown function evaluations. The representation theorem holds for more general loss functions (e.g., those from logistic or Poisson regression); see Wahba (199) and Gu (22). 4.2 General Solution Applied to Cubic Smoothing Spline Consider again the regression problem y i = f(x i ) + ϵ i, i = 1, 2,..., n where x i [, 1] and ϵ i N(, σ 2 ). We focus on the cubic smoothing spline solution to this problem. That is, we find a function that minimizes {y i f(x i )} 2 + λ f (x) 2 dx. As discussed in Subsection 4.1, N is the space of linear functions from (23) with p = 1 and a basis of ϕ 1 (x) = 1, ϕ 2 (x) = x. In this case N happens to be an RKHS, but in general it is not even necessary to define an inner product on N. H = W2 comes from Example 3.5. We know that the reproducing kernel for H is R(s, t) = 1 so the solution has the form From (21) we have (t u) + (s u) + du = max(s, t) min2 (s, t) 2 ˆf(x) = ˆd (1) + ˆd 1 x i + ĉ i R(x i, x). i=i (Q t Q + λs) 1 Q t y = [ ˆd ĉ ] min3 (s, t), 6 Appendix A.2 provides a demonstration with R code of fitting the cubic smoothing spline using the solution in (22) on some actual data. The demonstration also includes searching for the best value of the tuning parameter λ which is briefly discussed in Section General Solution Applied to Ridge Regression We now solve the linear ridge regression problem of minimizing (13) with the RKHS approach detailed above. (22) Although the RKHS framework is not necessary to solve the ridge regression problem, it serves as a good illustration of the RKHS machinery. To put the ridge regression problem in the framework of (16) consider Ṽ = {f(x) = β + p β j x j }. (23) j=1 with the penalty function p J(f) = βj 2. j=1 16

17 Notice that by letting Ṽ = V H where V is the RKHS from Example 3.2 and H is from Example 3.3, we have J(f) = f, f H. The dimension of V = N = N is M = 1 and ϕ 1 (x) = 1. From (17) the solution takes the form ˆf λ (x) = ˆd 1 + ĉ i R(x i, x) (24) with (Q t Q + λs) 1 Q t y = [ ˆd1 ĉ as given by (21). In this case it is more familiar to write the solution as ] (25) ˆf λ (x) = ˆβ + ˆβ 1 x 1 + ˆβ 2 x ˆβ p x p. This can be done by recalling from Example 3.3 that R(x i, x) = x i1 x 1 + x i2 x x ip x p. Substituting into (24) gives ˆf(x) = ˆd 1 + which implies that ĉ i x i1 x 1 + ĉ i x i2 x ˆβ = ˆd 1 and ˆβj = ĉ i x ij ĉ i x ip x p, for j = 1, 2,..., p. In Appendix A.1, we provide a demonstration of this solution on some actual data using the statistical software R. Both of the previous examples illustrate a typical case in which Ṽ is constructed as the direct sum of a RKHS V and a RKHS H whose intersection is the zero vector, Ṽ = V H. Because the intersection is zero, any f Ṽ can be written uniquely as f = f + f 1 with f V and f 1 H. The importance of this is that if the penalty functional J happens to satisfy J(f) = f 1, f 1 H, then all of our assumptions about J hold immediately with N = V, and the solution in (21) can be applied. As a further aside, when V is itself a RKHS, then Ṽ is an RKHS under the inner product f, g Ṽ f, g V + f 1, g 1 H. Note that V and H are orthogonal under this inner product. This orthogonal decomposition of Ṽ is also closely related to additive models and more generally Smoothing Spline ANOVA models (Gu 22). 17

18 4.4 Choosing the Degree of Smoothness With the penalized regression procedures described above, the choice of the smoothing parameter λ is an important issue. There are many methods available for this task; e.g., visual inspection of the fit, cross-validation, generalized MLE and generalized cross-validation (GCV). For those examples given in the appendix, we use the GCV approach, which works as follows for any generalized ridge regression solution, such as those in (25) and (22). Suppose that an estimate admits the following closed-form expression: ˆβ = (Q t Q + λs) 1 Q t y. The GCV choice of λ for this generalized ridge estimate is the minimizer of V (λ) = 1 n (I A(λ)y 2 /[ 1 n Trace{I A(λ)}]2, where A(λ) = Q(Q t Q + λs) 1 Q t. The goal of GCV is to find the optimal λ so that the resulting ˆβ has the smallest mean squared error. For more details about GCV and other methods of finding λ see Golub, Heath & Wahba (1979), Allen (1974), Wecker & Ansley (1983) and Wahba (199). 5 Concluding Remarks We have given several examples motivating the utility of the RKHS approach to penalized regression problems. We reviewed the building blocks necessary to define an RKHS and presented several key results about these spaces. Finally, we used the results to illustrate estimation for ridge regression and the cubic smoothing spline problems, providing transparent R code to enhance understanding. References Allen, D. (1974), The relationship between variable selection and data augmentation and a method for prediction, Technometrics 16, Berlinet, A. & Thomas-Agnan, C. (24), Reproducing Kernel Hilbert Spaces in Probability and Statistics, Kluwer Academic Publishers, Norwell, MA. de Barra, G. (1981), Measure Theory and Integration, Horwood Publishing, West Sussex, UK. Eubank, R. (1999), Nonparametric Regression and Spline Smoothing, Marcel Dekker, New York, NY. Golub, G., Heath, M. & Wahba, G. (1979), Generalized cross-validation as a method for choosing a good ridge parameter, Technometrics 21, Gu, C. (22), Smoothing Spline ANOVA Models, Springer-Verlag, New York, NY. 18

19 Gu, C. & Qiu, C. (1993), Smoothing spline density estimation: Theory, Annals of Statistics 21, Hastie, T., Tibshirani, R. & Friedman, J. (21), The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer-Verlag, New York, NY. Hoerl, A. & Kennard, R. (197a), Ridge regression: Biased estimation for nonorthogonal problems, Technometrics 12, Lin, Y. & Zhang, H. (26), Component selection and smoothing in smoothing spline analysis of variance models, Annals of Statistics 34, Máté, L. (199), Hilbert Space Methods in Science and Engineering, Taylor & Francis, Oxford, UK. Naylor, A. & Sell, G. (1982), Linear Operator Theory in Engineering and Science, Springer, New York, NY. O Sullivan, F., Yandell, B. & Raynor, W. (1986), Automatic smoothing of regression functions in generalized linear models, Journal of the American Statistical Association 81, Rustagi, J. (1994), Optimization Techniques in Statistics, Academic Press, London, UK. Storlie, C., Bondell, H. & Reich, B. (21), A locally adaptive penalty for estimation of functions with varying roughness, Journal of Computational and Graphical Statistics 19, Storlie, C., Bondell, H., Reich, B. & Zhang, H. (21), Surface estimation, variable selection, and the nonparametric oracle property, Statistica Sinica (to appear). Wahba, G. (199), Spline Models for Observational Data, vol. 59. SIAM. CBMS-NSF Regional Conference Series in Applied Mathematics. Wecker, W. & Ansley, C. (1983), The signal extraction approach to nonlinear regression and spline smoothing, Journal of the American Statistical Association 78, Young, N. (1988), An Introduction to Hilbert Space, Cambridge University Press, Cambridge, UK. 19

20 A Examples using R A.1 RKHS solution applied to Ridge Regression We consider a sample of size n = 2, (y 1, y 2, y 3,..., y 2 ), from the model y i = β + β 1 x i1 + β 2 x i2 + ϵ i where β = 2, β 1 = 3, β 2 = and ϵ i has a N(,.25 2 ). The distribution of the covariates, x 1 and x 2, is uniform in [, 1] 2 so that x 2 is uninformative. The following lines of code generate the data. ###### Data ######## set.seed(3) n<-2 x1<-runif(n) x2<-runif(n) X<-matrix(c(x1,x2),ncol=2) y<-2+3*x1+rnorm(n,sd=.25) # design matrix Below we give a function ridge regression to solve the ridge regression problem using the RKHS framework we discussed in Section 4.2. ##### function to find the inverse of a matrix #### my.inv<-function(x,eps=1e-12){ eig.x<-eigen(x,symmetric=t) P<-eig.X[[2]] lambda<-eig.x[[1]] ind<-lambda>eps lambda[ind]<-1/lambda[ind] lambda[!ind]<- ans<-p%*%diag(lambda,nrow=length(lambda))%*%t(p) return(ans) } ###### Reproducing Kernel ######### rk<-function(s,t){ p<-length(s) rk<- for (i in 1:p){ rk<-s[i]*t[i]+rk } return( (rk) ) } ##### Gram matrix ####### get.gramm<-function(x){ n<-dim(x)[1] Gramm<-matrix(,n,n) #initializes Gramm array #i=index for rows #j=index for columns Gramm<-as.matrix(Gramm) # Gramm matrix for (i in 1:n){ for (j in 1:n){ 2

21 Gramm[i,j]<-rk(X[i,],X[j,]) } } return(gramm) } ridge.regression<-function(x,y,lambda){ Gramm<-get.gramm(X) #Gramm matrix (nxn) n<-dim(x)[1] # n=length of y J<-matrix(1,n,1) # vector of ones dim Q<-cbind(J,Gramm) # design matrix m<-1 # dimension of the null space of the penalty S<-matrix(,n+m,n+m) #initialize S S[(m+1):(n+m),(m+1):(n+m)]<-Gramm #non-zero part of S M<-(t(Q)%*%Q+lambda*S) M.inv<-my.inv(M) # inverse of M gamma.hat<-crossprod(m.inv,crossprod(q,y)) f.hat<-q%*%gamma.hat A<-Q%*%M.inv%*%t(Q) tr.a<-sum(diag(a)) #trace of hat matrix rss<-t(y-f.hat)%*%(y-f.hat) #residual sum of squares gcv<-n*rss/(n-tr.a)^2 #obtain GCV score return(list(f.hat=f.hat,gamma.hat=gamma.hat,gcv=gcv)) } A simple direct search for the GCV optimal smoothing parameter can be made as follows: # Plot of GCV lambda<-1e-8 V<-rep(,4) for (i in 1:4){ V[i]<-ridge.regression(X,y,lambda)$gcv #obtain GCV score lambda<-lambda*1.5 #increase lambda } index<-(1:4) plot(1.5^(index-1)*1e-8,v,type="l",main="gcv score",lwd=2,xlab="lambda",ylab="gcv") # plot score The GCV plot produced by this code is displayed in Figure A.2. Now, by following Section 4.2, ˆβ, ˆβ 1 and ˆβ 2 can be obtained as follows. i<-(1:6)[v==min(v)] # extract index of min(v) opt.mod<-ridge.regression(x,y,1.5^(i-1)*1e-8) #fit optimal model ### finding beta., beta.1 and beta.2 ########## gamma.hat<-opt.mod$gamma.hat beta.hat.<-opt.mod$gamma.hat[1]#intercept beta.hat<-gamma.hat[2:21, ]%*%X #slope and noise term coefficients 21

22 The resulting estimates are: ˆβ = , ˆβ1 = and ˆβ 2 = A.2 RKHS solution applied to Cubic Smoothing Spline We consider a sample of size n = 5, (y 1, y 2, y 3,..., y 2 ), from the model y i = sin(2πx i ) + ϵ i where ϵ i has a N(,.2 2 ). The following code generates x and y ###### Data ######## set.seed(3) n<-5 x<-matrix(runif(n),nrow,ncol=1) x.star<-matrix(sort(x),nrow,ncol=1) # sorted x, used by plot y<-sin(2*pi*x.star)+rnorm(n,sd=.2) Below we give a function to find the cubic smoothing spline using the RKHS framework we discussed in Section 4.3. We also provide a graph with our estimation along with the true function and data. #### Reproducing Kernel for <f,g>=int_^1 f (x)g (x)dx ##### rk.1<-function(s,t){ return( (1/2)*min(s,t)^2)*(max(s,t)+ (1/6)*(min(s,t))^3 ) } get.gramm.1<-function(x){ n<-dim(x)[1] Gramm<-matrix(,n,n) #initializes Gramm array #i=index for rows #j=index for columns Gramm<-as.matrix(Gramm) # Gramm matrix for (i in 1:n){ for (j in 1:n){ Gramm[i,j]<-rk.1(X[i,],X[j,]) } } return(gramm) } smoothing.spline<-function(x,y,lambda){ Gramm<-get.gramm.1(X) #Gramm matrix (nxn) n<-dim(x)[1] # n=length of y J<-matrix(1,n,1) # vector of ones dim T<-cbind(J,X) # matrix with a basis for the null space of the penalty Q<-cbind(T,Gramm) # design matrix 22

23 } m<-dim(t)[2] # dimension of the null space of the penalty S<-matrix(,n+m,n+m) #initialize S S[(m+1):(n+m),(m+1):(n+m)]<-Gramm #non-zero part of S M<-(t(Q)%*%Q+lambda*S) M.inv<-my.inv(M) # inverse of M gamma.hat<-crossprod(m.inv,crossprod(q,y)) f.hat<-q%*%gamma.hat A<-Q%*%M.inv%*%t(Q) tr.a<-sum(diag(a)) #trace of hat matrix rss<-t(y-f.hat)%*%(y-f.hat) #residual sum of squares gcv<-n*rss/(n-tr.a)^2 #obtain GCV score return(list(f.hat=f.hat,gamma.hat=gamma.hat,gcv=gcv)) A simple direct search for the GCV optimal smoothing parameter can be made as follows: ### Now we have to find an optimal lambda using GCV... ### Plot of GCV lambda<-1e-8 V<-rep(,6) for (i in 1:6){ V[i]<-smoothing.spline(x.star,y,lambda)$gcv #obtain GCV score lambda<-lambda*1.5 #increase lambda } plot(1:6,v,type="l",main="gcv score",xlab="i") # plot score i<-(1:6)[v==min(v)] # extract index of min(v) opt.mod.2<-smoothing.spline(x.star,y,1.5^(i-1)*1e-8) #fit optimal model #Graph (Cubic Spline) plot(x.star,opt.mod.2$f.hat,type="l",lty=2,lwd=2,col="blue",xlab="x",ylim=c(-2.5,1.5), xlim=c(-.1,1.1),ylab="response",main="cubic Spline")#predictions lines(x.star,sin(2*pi*x.star),lty=1,lwd=2) #true legend(-.1,-1.5,c("predictions","true"),lty=c(2,1),bty="n",lwd=c(2,2), col=c("blue","black")) points(x.star,y) #data This graph is plotted in Figure A.2. 23

24 Figure 1: Linear Interpolating Spline Figure 2: GCV Score

25 Figure 3: Cubic Spline

Adaptive weighting for flexible estimation in nonparametric regression models.

Adaptive weighting for flexible estimation in nonparametric regression models. University of New Mexico UNM Digital Repository Mathematics & Statistics ETDs Electronic Theses and Dissertations 7-2-211 Adaptive weighting for flexible estimation in nonparametric regression models.

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

Reduction of Model Complexity and the Treatment of Discrete Inputs in Computer Model Emulation

Reduction of Model Complexity and the Treatment of Discrete Inputs in Computer Model Emulation Reduction of Model Complexity and the Treatment of Discrete Inputs in Computer Model Emulation Curtis B. Storlie a a Los Alamos National Laboratory E-mail:storlie@lanl.gov Outline Reduction of Emulator

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 12, 2007 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto Reproducing Kernel Hilbert Spaces 9.520 Class 03, 15 February 2006 Andrea Caponnetto About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces 9.520: Statistical Learning Theory and Applications February 10th, 2010 Reproducing Kernel Hilbert Spaces Lecturer: Lorenzo Rosasco Scribe: Greg Durrett 1 Introduction In the previous two lectures, we

More information

Cambridge University Press The Mathematics of Signal Processing Steven B. Damelin and Willard Miller Excerpt More information

Cambridge University Press The Mathematics of Signal Processing Steven B. Damelin and Willard Miller Excerpt More information Introduction Consider a linear system y = Φx where Φ can be taken as an m n matrix acting on Euclidean space or more generally, a linear operator on a Hilbert space. We call the vector x a signal or input,

More information

Lecture 14: Variable Selection - Beyond LASSO

Lecture 14: Variable Selection - Beyond LASSO Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 11, 2009 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces. Math 350 Fall 2011 Notes about inner product spaces In this notes we state and prove some important properties of inner product spaces. First, recall the dot product on R n : if x, y R n, say x = (x 1,...,

More information

The following definition is fundamental.

The following definition is fundamental. 1. Some Basics from Linear Algebra With these notes, I will try and clarify certain topics that I only quickly mention in class. First and foremost, I will assume that you are familiar with many basic

More information

Spatial Process Estimates as Smoothers: A Review

Spatial Process Estimates as Smoothers: A Review Spatial Process Estimates as Smoothers: A Review Soutir Bandyopadhyay 1 Basic Model The observational model considered here has the form Y i = f(x i ) + ɛ i, for 1 i n. (1.1) where Y i is the observed

More information

The Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee

The Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee The Learning Problem and Regularization 9.520 Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing

More information

Normed & Inner Product Vector Spaces

Normed & Inner Product Vector Spaces Normed & Inner Product Vector Spaces ECE 174 Introduction to Linear & Nonlinear Optimization Ken Kreutz-Delgado ECE Department, UC San Diego Ken Kreutz-Delgado (UC San Diego) ECE 174 Fall 2016 1 / 27 Normed

More information

Effective Computation for Odds Ratio Estimation in Nonparametric Logistic Regression

Effective Computation for Odds Ratio Estimation in Nonparametric Logistic Regression Communications of the Korean Statistical Society 2009, Vol. 16, No. 4, 713 722 Effective Computation for Odds Ratio Estimation in Nonparametric Logistic Regression Young-Ju Kim 1,a a Department of Information

More information

IV. Matrix Approximation using Least-Squares

IV. Matrix Approximation using Least-Squares IV. Matrix Approximation using Least-Squares The SVD and Matrix Approximation We begin with the following fundamental question. Let A be an M N matrix with rank R. What is the closest matrix to A that

More information

Kernels for Multi task Learning

Kernels for Multi task Learning Kernels for Multi task Learning Charles A Micchelli Department of Mathematics and Statistics State University of New York, The University at Albany 1400 Washington Avenue, Albany, NY, 12222, USA Massimiliano

More information

The Hilbert Space of Random Variables

The Hilbert Space of Random Variables The Hilbert Space of Random Variables Electrical Engineering 126 (UC Berkeley) Spring 2018 1 Outline Fix a probability space and consider the set H := {X : X is a real-valued random variable with E[X 2

More information

An Introduction to Wavelets and some Applications

An Introduction to Wavelets and some Applications An Introduction to Wavelets and some Applications Milan, May 2003 Anestis Antoniadis Laboratoire IMAG-LMC University Joseph Fourier Grenoble, France An Introduction to Wavelets and some Applications p.1/54

More information

RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee

RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets 9.520 Class 22, 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce an alternate perspective of RKHS via integral operators

More information

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space. Chapter 1 Preliminaries The purpose of this chapter is to provide some basic background information. Linear Space Hilbert Space Basic Principles 1 2 Preliminaries Linear Space The notion of linear space

More information

Inner product spaces. Layers of structure:

Inner product spaces. Layers of structure: Inner product spaces Layers of structure: vector space normed linear space inner product space The abstract definition of an inner product, which we will see very shortly, is simple (and by itself is pretty

More information

COMMON COMPLEMENTS OF TWO SUBSPACES OF A HILBERT SPACE

COMMON COMPLEMENTS OF TWO SUBSPACES OF A HILBERT SPACE COMMON COMPLEMENTS OF TWO SUBSPACES OF A HILBERT SPACE MICHAEL LAUZON AND SERGEI TREIL Abstract. In this paper we find a necessary and sufficient condition for two closed subspaces, X and Y, of a Hilbert

More information

MATH 590: Meshfree Methods

MATH 590: Meshfree Methods MATH 590: Meshfree Methods Chapter 14: The Power Function and Native Space Error Estimates Greg Fasshauer Department of Applied Mathematics Illinois Institute of Technology Fall 2010 fasshauer@iit.edu

More information

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable.

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable. Linear SVM (separable case) First consider the scenario where the two classes of points are separable. It s desirable to have the width (called margin) between the two dashed lines to be large, i.e., have

More information

Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space

Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space Statistical Inference with Reproducing Kernel Hilbert Space Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department

More information

CHAPTER 3 Further properties of splines and B-splines

CHAPTER 3 Further properties of splines and B-splines CHAPTER 3 Further properties of splines and B-splines In Chapter 2 we established some of the most elementary properties of B-splines. In this chapter our focus is on the question What kind of functions

More information

Spline Density Estimation and Inference with Model-Based Penalities

Spline Density Estimation and Inference with Model-Based Penalities Spline Density Estimation and Inference with Model-Based Penalities December 7, 016 Abstract In this paper we propose model-based penalties for smoothing spline density estimation and inference. These

More information

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = 30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

Recap from previous lecture

Recap from previous lecture Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience

More information

10-725/36-725: Convex Optimization Prerequisite Topics

10-725/36-725: Convex Optimization Prerequisite Topics 10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

2. Signal Space Concepts

2. Signal Space Concepts 2. Signal Space Concepts R.G. Gallager The signal-space viewpoint is one of the foundations of modern digital communications. Credit for popularizing this viewpoint is often given to the classic text of

More information

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88 Math Camp 2010 Lecture 4: Linear Algebra Xiao Yu Wang MIT Aug 2010 Xiao Yu Wang (MIT) Math Camp 2010 08/10 1 / 88 Linear Algebra Game Plan Vector Spaces Linear Transformations and Matrices Determinant

More information

The Transpose of a Vector

The Transpose of a Vector 8 CHAPTER Vectors The Transpose of a Vector We now consider the transpose of a vector in R n, which is a row vector. For a vector u 1 u. u n the transpose is denoted by u T = [ u 1 u u n ] EXAMPLE -5 Find

More information

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product Chapter 4 Hilbert Spaces 4.1 Inner Product Spaces Inner Product Space. A complex vector space E is called an inner product space (or a pre-hilbert space, or a unitary space) if there is a mapping (, )

More information

October 25, 2013 INNER PRODUCT SPACES

October 25, 2013 INNER PRODUCT SPACES October 25, 2013 INNER PRODUCT SPACES RODICA D. COSTIN Contents 1. Inner product 2 1.1. Inner product 2 1.2. Inner product spaces 4 2. Orthogonal bases 5 2.1. Existence of an orthogonal basis 7 2.2. Orthogonal

More information

Self-adjoint extensions of symmetric operators

Self-adjoint extensions of symmetric operators Self-adjoint extensions of symmetric operators Simon Wozny Proseminar on Linear Algebra WS216/217 Universität Konstanz Abstract In this handout we will first look at some basics about unbounded operators.

More information

Regularization: Ridge Regression and the LASSO

Regularization: Ridge Regression and the LASSO Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression

More information

On Riesz-Fischer sequences and lower frame bounds

On Riesz-Fischer sequences and lower frame bounds On Riesz-Fischer sequences and lower frame bounds P. Casazza, O. Christensen, S. Li, A. Lindner Abstract We investigate the consequences of the lower frame condition and the lower Riesz basis condition

More information

Linear Algebra March 16, 2019

Linear Algebra March 16, 2019 Linear Algebra March 16, 2019 2 Contents 0.1 Notation................................ 4 1 Systems of linear equations, and matrices 5 1.1 Systems of linear equations..................... 5 1.2 Augmented

More information

4.3 - Linear Combinations and Independence of Vectors

4.3 - Linear Combinations and Independence of Vectors - Linear Combinations and Independence of Vectors De nitions, Theorems, and Examples De nition 1 A vector v in a vector space V is called a linear combination of the vectors u 1, u,,u k in V if v can be

More information

This pre-publication material is for review purposes only. Any typographical or technical errors will be corrected prior to publication.

This pre-publication material is for review purposes only. Any typographical or technical errors will be corrected prior to publication. This pre-publication material is for review purposes only. Any typographical or technical errors will be corrected prior to publication. Copyright Pearson Canada Inc. All rights reserved. Copyright Pearson

More information

Support Vector Machines for Classification: A Statistical Portrait

Support Vector Machines for Classification: A Statistical Portrait Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,

More information

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) Contents 1 Vector Spaces 1 1.1 The Formal Denition of a Vector Space.................................. 1 1.2 Subspaces...................................................

More information

Lecture 3: Statistical Decision Theory (Part II)

Lecture 3: Statistical Decision Theory (Part II) Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical

More information

Applied Analysis (APPM 5440): Final exam 1:30pm 4:00pm, Dec. 14, Closed books.

Applied Analysis (APPM 5440): Final exam 1:30pm 4:00pm, Dec. 14, Closed books. Applied Analysis APPM 44: Final exam 1:3pm 4:pm, Dec. 14, 29. Closed books. Problem 1: 2p Set I = [, 1]. Prove that there is a continuous function u on I such that 1 ux 1 x sin ut 2 dt = cosx, x I. Define

More information

Linear Algebra. The analysis of many models in the social sciences reduces to the study of systems of equations.

Linear Algebra. The analysis of many models in the social sciences reduces to the study of systems of equations. POLI 7 - Mathematical and Statistical Foundations Prof S Saiegh Fall Lecture Notes - Class 4 October 4, Linear Algebra The analysis of many models in the social sciences reduces to the study of systems

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

ELEMENTARY LINEAR ALGEBRA

ELEMENTARY LINEAR ALGEBRA ELEMENTARY LINEAR ALGEBRA K R MATTHEWS DEPARTMENT OF MATHEMATICS UNIVERSITY OF QUEENSLAND First Printing, 99 Chapter LINEAR EQUATIONS Introduction to linear equations A linear equation in n unknowns x,

More information

LECTURE NOTES ELEMENTARY NUMERICAL METHODS. Eusebius Doedel

LECTURE NOTES ELEMENTARY NUMERICAL METHODS. Eusebius Doedel LECTURE NOTES on ELEMENTARY NUMERICAL METHODS Eusebius Doedel TABLE OF CONTENTS Vector and Matrix Norms 1 Banach Lemma 20 The Numerical Solution of Linear Systems 25 Gauss Elimination 25 Operation Count

More information

A Brief Outline of Math 355

A Brief Outline of Math 355 A Brief Outline of Math 355 Lecture 1 The geometry of linear equations; elimination with matrices A system of m linear equations with n unknowns can be thought of geometrically as m hyperplanes intersecting

More information

Representer Theorem. By Grace Wahba and Yuedong Wang. Abstract. The representer theorem plays an outsized role in a large class of learning problems.

Representer Theorem. By Grace Wahba and Yuedong Wang. Abstract. The representer theorem plays an outsized role in a large class of learning problems. Representer Theorem By Grace Wahba and Yuedong Wang Abstract The representer theorem plays an outsized role in a large class of learning problems. It provides a means to reduce infinite dimensional optimization

More information

Introduction to Real Analysis Alternative Chapter 1

Introduction to Real Analysis Alternative Chapter 1 Christopher Heil Introduction to Real Analysis Alternative Chapter 1 A Primer on Norms and Banach Spaces Last Updated: March 10, 2018 c 2018 by Christopher Heil Chapter 1 A Primer on Norms and Banach Spaces

More information

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2. Chapter 1 LINEAR EQUATIONS 11 Introduction to linear equations A linear equation in n unknowns x 1, x,, x n is an equation of the form a 1 x 1 + a x + + a n x n = b, where a 1, a,, a n, b are given real

More information

UNBOUNDED OPERATORS ON HILBERT SPACES. Let X and Y be normed linear spaces, and suppose A : X Y is a linear map.

UNBOUNDED OPERATORS ON HILBERT SPACES. Let X and Y be normed linear spaces, and suppose A : X Y is a linear map. UNBOUNDED OPERATORS ON HILBERT SPACES EFTON PARK Let X and Y be normed linear spaces, and suppose A : X Y is a linear map. Define { } Ax A op = sup x : x 0 = { Ax : x 1} = { Ax : x = 1} If A

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Kernel Method: Data Analysis with Positive Definite Kernels

Kernel Method: Data Analysis with Positive Definite Kernels Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University

More information

LINEAR ALGEBRA W W L CHEN

LINEAR ALGEBRA W W L CHEN LINEAR ALGEBRA W W L CHEN c W W L Chen, 1997, 2008. This chapter is available free to all individuals, on the understanding that it is not to be used for financial gain, and may be downloaded and/or photocopied,

More information

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x = Linear Algebra Review Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1 x x = 2. x n Vectors of up to three dimensions are easy to diagram.

More information

Elementary linear algebra

Elementary linear algebra Chapter 1 Elementary linear algebra 1.1 Vector spaces Vector spaces owe their importance to the fact that so many models arising in the solutions of specific problems turn out to be vector spaces. The

More information

Kernel Methods. Barnabás Póczos

Kernel Methods. Barnabás Póczos Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels

More information

SPRING 2006 PRELIMINARY EXAMINATION SOLUTIONS

SPRING 2006 PRELIMINARY EXAMINATION SOLUTIONS SPRING 006 PRELIMINARY EXAMINATION SOLUTIONS 1A. Let G be the subgroup of the free abelian group Z 4 consisting of all integer vectors (x, y, z, w) such that x + 3y + 5z + 7w = 0. (a) Determine a linearly

More information

MAT Linear Algebra Collection of sample exams

MAT Linear Algebra Collection of sample exams MAT 342 - Linear Algebra Collection of sample exams A-x. (0 pts Give the precise definition of the row echelon form. 2. ( 0 pts After performing row reductions on the augmented matrix for a certain system

More information

,, rectilinear,, spherical,, cylindrical. (6.1)

,, rectilinear,, spherical,, cylindrical. (6.1) Lecture 6 Review of Vectors Physics in more than one dimension (See Chapter 3 in Boas, but we try to take a more general approach and in a slightly different order) Recall that in the previous two lectures

More information

Fundamentals of Linear Algebra. Marcel B. Finan Arkansas Tech University c All Rights Reserved

Fundamentals of Linear Algebra. Marcel B. Finan Arkansas Tech University c All Rights Reserved Fundamentals of Linear Algebra Marcel B. Finan Arkansas Tech University c All Rights Reserved 2 PREFACE Linear algebra has evolved as a branch of mathematics with wide range of applications to the natural

More information

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak, scribe: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters,

More information

Effective Dimension and Generalization of Kernel Learning

Effective Dimension and Generalization of Kernel Learning Effective Dimension and Generalization of Kernel Learning Tong Zhang IBM T.J. Watson Research Center Yorktown Heights, Y 10598 tzhang@watson.ibm.com Abstract We investigate the generalization performance

More information

1 Matrices and Systems of Linear Equations

1 Matrices and Systems of Linear Equations March 3, 203 6-6. Systems of Linear Equations Matrices and Systems of Linear Equations An m n matrix is an array A = a ij of the form a a n a 2 a 2n... a m a mn where each a ij is a real or complex number.

More information

Systems of Linear Equations

Systems of Linear Equations Systems of Linear Equations Math 108A: August 21, 2008 John Douglas Moore Our goal in these notes is to explain a few facts regarding linear systems of equations not included in the first few chapters

More information

Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014

Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014 Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014 Linear Algebra A Brief Reminder Purpose. The purpose of this document

More information

x 1 x 2. x 1, x 2,..., x n R. x n

x 1 x 2. x 1, x 2,..., x n R. x n WEEK In general terms, our aim in this first part of the course is to use vector space theory to study the geometry of Euclidean space A good knowledge of the subject matter of the Matrix Applications

More information

(i) [7 points] Compute the determinant of the following matrix using cofactor expansion.

(i) [7 points] Compute the determinant of the following matrix using cofactor expansion. Question (i) 7 points] Compute the determinant of the following matrix using cofactor expansion 2 4 2 4 2 Solution: Expand down the second column, since it has the most zeros We get 2 4 determinant = +det

More information

Vectors. January 13, 2013

Vectors. January 13, 2013 Vectors January 13, 2013 The simplest tensors are scalars, which are the measurable quantities of a theory, left invariant by symmetry transformations. By far the most common non-scalars are the vectors,

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from

More information

QUADRATIC MAJORIZATION 1. INTRODUCTION

QUADRATIC MAJORIZATION 1. INTRODUCTION QUADRATIC MAJORIZATION JAN DE LEEUW 1. INTRODUCTION Majorization methods are used extensively to solve complicated multivariate optimizaton problems. We refer to de Leeuw [1994]; Heiser [1995]; Lange et

More information

Introduction - Motivation. Many phenomena (physical, chemical, biological, etc.) are model by differential equations. f f(x + h) f(x) (x) = lim

Introduction - Motivation. Many phenomena (physical, chemical, biological, etc.) are model by differential equations. f f(x + h) f(x) (x) = lim Introduction - Motivation Many phenomena (physical, chemical, biological, etc.) are model by differential equations. Recall the definition of the derivative of f(x) f f(x + h) f(x) (x) = lim. h 0 h Its

More information

Vector Spaces. Vector space, ν, over the field of complex numbers, C, is a set of elements a, b,..., satisfying the following axioms.

Vector Spaces. Vector space, ν, over the field of complex numbers, C, is a set of elements a, b,..., satisfying the following axioms. Vector Spaces Vector space, ν, over the field of complex numbers, C, is a set of elements a, b,..., satisfying the following axioms. For each two vectors a, b ν there exists a summation procedure: a +

More information

Theorems. Least squares regression

Theorems. Least squares regression Theorems In this assignment we are trying to classify AML and ALL samples by use of penalized logistic regression. Before we indulge on the adventure of classification we should first explain the most

More information

ELEMENTARY LINEAR ALGEBRA

ELEMENTARY LINEAR ALGEBRA ELEMENTARY LINEAR ALGEBRA K. R. MATTHEWS DEPARTMENT OF MATHEMATICS UNIVERSITY OF QUEENSLAND Second Online Version, December 1998 Comments to the author at krm@maths.uq.edu.au Contents 1 LINEAR EQUATIONS

More information

2tdt 1 y = t2 + C y = which implies C = 1 and the solution is y = 1

2tdt 1 y = t2 + C y = which implies C = 1 and the solution is y = 1 Lectures - Week 11 General First Order ODEs & Numerical Methods for IVPs In general, nonlinear problems are much more difficult to solve than linear ones. Unfortunately many phenomena exhibit nonlinear

More information

2. Dual space is essential for the concept of gradient which, in turn, leads to the variational analysis of Lagrange multipliers.

2. Dual space is essential for the concept of gradient which, in turn, leads to the variational analysis of Lagrange multipliers. Chapter 3 Duality in Banach Space Modern optimization theory largely centers around the interplay of a normed vector space and its corresponding dual. The notion of duality is important for the following

More information

TOOLS FROM HARMONIC ANALYSIS

TOOLS FROM HARMONIC ANALYSIS TOOLS FROM HARMONIC ANALYSIS BRADLY STADIE Abstract. The Fourier transform can be thought of as a map that decomposes a function into oscillatory functions. In this paper, we will apply this decomposition

More information

Regression: Lecture 2

Regression: Lecture 2 Regression: Lecture 2 Niels Richard Hansen April 26, 2012 Contents 1 Linear regression and least squares estimation 1 1.1 Distributional results................................ 3 2 Non-linear effects and

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Worst-Case Bounds for Gaussian Process Models

Worst-Case Bounds for Gaussian Process Models Worst-Case Bounds for Gaussian Process Models Sham M. Kakade University of Pennsylvania Matthias W. Seeger UC Berkeley Abstract Dean P. Foster University of Pennsylvania We present a competitive analysis

More information

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters, transforms,

More information

MATH 590: Meshfree Methods

MATH 590: Meshfree Methods MATH 590: Meshfree Methods Chapter 2 Part 3: Native Space for Positive Definite Kernels Greg Fasshauer Department of Applied Mathematics Illinois Institute of Technology Fall 2014 fasshauer@iit.edu MATH

More information

MAT2342 : Introduction to Applied Linear Algebra Mike Newman, fall Projections. introduction

MAT2342 : Introduction to Applied Linear Algebra Mike Newman, fall Projections. introduction MAT4 : Introduction to Applied Linear Algebra Mike Newman fall 7 9. Projections introduction One reason to consider projections is to understand approximate solutions to linear systems. A common example

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

Hilbert Spaces. Contents

Hilbert Spaces. Contents Hilbert Spaces Contents 1 Introducing Hilbert Spaces 1 1.1 Basic definitions........................... 1 1.2 Results about norms and inner products.............. 3 1.3 Banach and Hilbert spaces......................

More information

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection

More information

Chapter 1: Systems of linear equations and matrices. Section 1.1: Introduction to systems of linear equations

Chapter 1: Systems of linear equations and matrices. Section 1.1: Introduction to systems of linear equations Chapter 1: Systems of linear equations and matrices Section 1.1: Introduction to systems of linear equations Definition: A linear equation in n variables can be expressed in the form a 1 x 1 + a 2 x 2

More information

Classification 1: Linear regression of indicators, linear discriminant analysis

Classification 1: Linear regression of indicators, linear discriminant analysis Classification 1: Linear regression of indicators, linear discriminant analysis Ryan Tibshirani Data Mining: 36-462/36-662 April 2 2013 Optional reading: ISL 4.1, 4.2, 4.4, ESL 4.1 4.3 1 Classification

More information

LECTURE 1: SOURCES OF ERRORS MATHEMATICAL TOOLS A PRIORI ERROR ESTIMATES. Sergey Korotov,

LECTURE 1: SOURCES OF ERRORS MATHEMATICAL TOOLS A PRIORI ERROR ESTIMATES. Sergey Korotov, LECTURE 1: SOURCES OF ERRORS MATHEMATICAL TOOLS A PRIORI ERROR ESTIMATES Sergey Korotov, Institute of Mathematics Helsinki University of Technology, Finland Academy of Finland 1 Main Problem in Mathematical

More information