Optimal Design in Regression. and Spline Smoothing

Size: px
Start display at page:

Download "Optimal Design in Regression. and Spline Smoothing"

Transcription

1 Optimal Design in Regression and Spline Smoothing by Jaerin Cho A thesis submitted to the Department of Mathematics and Statistics in conformity with the requirements for the degree of Doctor of Philosophy Queen s University Kingston, Ontario, Canada July 2007 Copyright c Jaerin Cho, 2007

2 ABSTRACT This thesis represents an attempt to generalize the classical Theory of Optimal Design to popular regression models, based on Rational and Spline approximations. The problem of finding optimal designs for such models can be reduced to solving certain minimax problems. Explicit solutions to such problems can be obtained only in a few selected models, such as polynomial regression. Even when an optimal design can be found, it has, from the point of view of modern nonparametric regression, certain drawbacks. For example, in the polynomial regression case, the optimal design crucially depends on the degree m of approximating polynomial. Hence, it can be used only when such degree is given/known in advance. We present a partial, but practical, solution to this problem. Namely, the so-called Super Chebyshev Design has been found, which does not depend on the degree m of the underlying polynomial regression in a large range of m, and at the same time is asymptotically more than 90% efficient. Similar results are obtained in the case of rational regression, even though the exact form of optimal design in this case remains unknown. Optimal Designs in the case of Spline Interpolation are also currently unknown. This problem, however, has a simple solution in the case of Cardinal Spline Interpolation. Until recently, this model has been practically unknown in modern nonparametric regression. We demonstrate the usefulness of Cardinal Kernel Spline Estimates in ii

3 nonparametric regression, by proving their asymptotic optimality, in certain classes of smooth functions. In this way, we have found, for the first time, a theoretical justification of a well known empirical observation, by which cubic splines suffice, in most practical applications. iii

4 ACKNOWLEDGEMENT I am thankful to my supervisor professor Boris Levit for guiding me throughout writing this thesis. I am also thankful to the committee for providing comments and suggestions. Finally, I would like to express my love to my family. iv

5 CONTENTS Abstract Acknowledgement ii iv Chapter. Introduction.. Brief history of optimal design 2.2. The scope of the thesis Extension of the concept of optimal design The super Chebyshev design The rational regression model Cardinal spline smoothing 7 Chapter 2. Overview of the optimal design theory Introduction The statistical model The classical concept of optimal designs Extended concept of optimal designs Criteria of Optimal Designs Fisher information and maximum likelihood estimators Minimax estimation An application to Optimal Designs An extension of the class of designs Ξ n Some properties of the information matrix M(ξ) Convex compactness of M The quadratic risk d(x, ξ) Strict concavity of log M Strict convexity of M 26 v

6 2.4. The equivalence theorem A stronger version of the equivalence theorem 3 Chapter 3. Designs in polynomial regression Introduction The goals of this chapter Optimal design in the polynomial regression of a given order The existence and uniqueness of the optimal design Conditions defining the optimal design Introduction of Jacobi polynomials Derivation of the optimal design Another method of deriving the optimal design The quadratic risk of the optimal design The Chebyshev design The super Chebyshev design (ξ o, a)-systems The super Chebyshev system The efficiency of the super Chebyshev design 59 Chapter 4. Designs in Rational Regression Introduction The goals of this chapter Optimal design in the rational regression of a given pole set The existence and uniqueness of the optimal design Conditions defining the optimal design Derivation of the optimal design The quadratic risk of the optimal design The Chebyshev design in rational regression Orthogonal system of rational functions on U = {z : z = } An orthogonal basis for the Chebyshev design 73 vi

7 The quadratic risk of the Chebyshev design Asymptotic behavior of the quadratic risk d(x, ξ c ) The super Chebyshev design The existence of (ξ o, a)-systems Relative efficiency of the super Chebyshev design 89 Chapter 5. Nonparametric regression based on fundamental splines of cardinal interpolation Introduction The observation model Kernel type estimators Goals of this chapter B-splines with equally spaced knots Definitions B-splines as probability densities Asymptotic behavior of B-splines for large m Generating functions C-splines and their properties Definitions Examples Asymptotic behavior of C-splines for large x Fourier transforms ˆL m Further properties of ˆL m Asymptotic behavior of C-splines for large m Kernel spline estimates Poisson summation formula A bound for the remainder terms Quadratic risk of C-spline estimates Asymptotic optimality of C-splines 26 vii

8 5.5.. Asymptotically optimal bandwidth The main result The simulation result An application of the method of GCV Definitions Properties of GCV An application to C-spline 37 References 38 viii

9 Chapter. Introduction In the classical theory of regression, a set of values x, x 2,..., x n of an independent variable x is selected, and observations are made on a related response variable y corresponding to the selected x-values. If y i denotes the response corresponding to x i, it is then assumed that y, y 2,..., y n are uncorrelated variables with common variance σ 2. If it is assumed additionally that there is a vector of parameters θ = (θ, θ 2,..., θ m ) Θ R m such that the trend function f(x) = E(y x) is completely determined by the parameter θ, such models are referred to as parametric regression models. More specifically, if it is assumed that the trend function f(x) belongs to the linear subspace spanned by a given set of linearly independent functions f (x), f 2 (x),..., f m (x) the model is called a linear regression model. The most popular linear regression model is the polynomial regression model which assumes that f(x) = m j= θ jx j. Any linear regression model can be represented in the form of a standard linear model, where y = Xθ + ε, y = (y i ) n, X = (X ij ) = (f j (x i )) n m, θ = (θ j ) m and ε = (ε i ) n. () It is well known that if the matrix X X is non-singular, the least squares estimates ˆθ = (X X) X y are the unique minimum variance unbiased estimates of the corresponding components of θ and Cov(ˆθ θ) = σ 2 (X X). The corresponding trend estimate is ˆf(x) = m j= ˆθ j f j (x), and its error, at a given point x, has mean 0 and variance σ 2 h (x)(x X) h(x), where h(x) = (f (x), f 2 (x),..., f m (x)). The collection ξ = {x i, i =,..., n} of the observation points is commonly referred to as the design of the model. The goal of the theory of optimal designs is to determine the best designs in a way that is both practically meaningful and theoretically feasible. There are many optimality criteria. An optimality criterion which has been widely studied is that of D-optimality in which the determinant of the covariance matrix σ 2 (X X) is minimized (or, equivalently, the determinant of the Fisher information matrix σ 2 X X is maximized). Another criterion, so-called G-optimality, is concerned with the variance of the predicted response. A G-optimal design (or minimax design) is one which minimizes the maximum variance max x E( ˆf(x) f(x)) 2.

10 The minimax design has the most clear practical meaning. In this thesis, the term optimal designs will be synonymous to minimax designs.. Brief history of optimal design. Historically, polynomial regression was the first model for which optimal designs were found independently by different authors, whose findings later led to the foundation of the general theory of optimal design. It is one of the simplest cases of the general theory, and represents a relatively small number of examples for which optimal designs can be explicitly found. In the case x [, ], the optimal designs were worked out numerically for lower degree polynomials by Smith (98) [44], up to the 5th degree. De La Garza (954) [4] showed that for a polynomial of degree m on any given interval, the information matrix X X corresponding to observations made at any k points, k m, can always be attained from observations made at exactly m points in the same interval. There are certain advantages when we apply this result to determining the optimal designs or the D-optimal designs. This result shows that in finding optimal designs, it is sufficient to consider designs x,..., x n which contain only m different points (cf. Lemma 3.). Assuming that the design contains only m different points x,..., x m with proportions p i of the number of observations at each x i, det(x X) = (det F ) 2 where det F = det(x j i ) i,j=,m = k<l m (x k x l ), is the so-called Vandermonde determinant. Thus the problem is to determine the observations points x,..., x m [, ] maximizing the Vandermonde determinant, since m i= p i is maximal when p i = m (i =, 2,..., m). The solution of this classical problem, also known as the problem of electrostatic equilibrium, goes back to Stieltjes (885); see e.g. Szegö [45], Sec It turns out that the D-optimal design is given by the zeros of the mth degree polynomial ( x 2 ) P m (x), where P m (x) is the classical Legendre polynomial of degree m (cf. Theorem 3.7). Now let us assume that the observations y i are made at m different observation points (nodes) x i [, ], i =,..., m. Note that in this case the problem of fitting a polynomial of degree m by least squares is equivalent to interpolating the observations y i by such a polynomial. If we replace the original power basis, x,..., x m by the basis provided by the fundamental polynomials of Lagrange 2 m i= p i,

11 interpolation with nodes x,..., x m, L i (x) = L(x) L(x i )(x x i ), where L(x) = (x x )(x x 2 ) (x x m ), we can easily obtain ˆf(x) = m ˆθ j x j = j= m y j L j (x). j= It then follows easily from () that E( ˆf(x) m f(x)) 2 = σ 2 L 2 i (x). i= Therefore ξ = {x i } is an optimal design if and if only it minimizes max x m i= L2 i (x), among all such designs (see Sec. 3.2). The problem of minimizing max x m i= L2 i (x) was first considered and solved by Féjer(932) [0]. He proved that the design ξ = {x i } minimizing max m x i= L2 i (x) is again given by the zeros of the polynomial ( x 2 ) P m (x). Using the De La Garza result [4], Guest [6] rediscovered in 958 the explicit formula for the optimal polynomial design, obtained earlier by Féjer. At the same time, Hoel [7] found the D-optimal design for polynomial regression combining the De La Garza result with the solution to the electrostatic equilibrium problem already known to Stieltjes. Thus he concluded that the optimal and the D-optimal design are the same, in the case of polynomial regression. This result prompted Kiefer and Wolfowitz to present their celebrated result, the so-called Equivalence Theorem. Surprisingly, it says that for any compact set X, and for any system of continuous linearly independent functions f i (x), i =,..., m, defined on X, optimal designs are equivalent to the D-optimal designs and the minimax risk depends neither on the set X, nor on the functions f i themselves, but exclusively on their number m. First, in [25], they proved their equivalence under an additional restriction that the design consists of m points and gives equal mass at each point. Later in [26] they proved this theorem in its most general form (see Theorem 2.5). A more general approach to the Equivalence Theorem was later proposed by Karlin and Studden [2]. They have shown that the existence of optimal designs in a general linear regression model follows, essentially, from von Neumann s classical Minimax Theorem, see Theorem

12 The results of Guest and Hoel (or the corresponding earlier results of Féjer and Stieltjes), together with the Equivalence Theorem, form the basis of the classical Theory of Optimal Design. Detailed discussions of this theory and its foundations are presented in [9], [20] and [34]. Guest [6] also compared the variances of the fitted polynomials corresponding to the discrete uniform design and the optimal design. He showed that for most of the region, the advantage lies with the uniform spacing method, whereas, at the extremes of the region of interpolation, the advantage lies decidedly with the optimal design. Generalizing earlier results going back to Stieltjes, Szegö ([45], pp. 232) showed that, in general, the zeros of the mth degree Jacobi polynomial P m (α,β) (cos θ) satisfy the asymptotic relation θ v = m {vπ + O()}, with O() being uniformly bounded for all values of v =, 2,..., m and m =, 2,.... In particular, this applies to the Legendre polynomials P m (x) = const P m. (,) Using this result, Fedorov ([9], pp. 9 92) notes that the sequence of optimal designs ξm converges weakly to a (continuous) Chebyshev design ξ c, where ξ c has the density π ( x 2 ) /2. Let us define G-efficiency of a given design ξ with respect to the optimal design ξ m, as d m (ξ m)/d m (ξ), where d m (ξ) = max x E( ˆf(x) f(x)) 2. Of course, here ˆf(x) depends on the design ξ. Also denote the D-efficiency of a design ξ as (D m (ξ)/d m (ξ m)) /m where D m (ξ) = det(x X). A natural question is whether the limiting design ξ c is asymptotically efficient in the sense of the thus introduced G- and D-efficiencies. It turns out that the answer to this question is negative in the first case and positive in the second. Namely, Kiefer and Studden [24] showed that the limiting G-efficiency of ξ c is d m (ξ lim m) m d m (ξ c ) = 2, (2) while the limiting D-efficiency of ξ c is ( ) Dm (ξ c ) m lim =. m D m (ξm) Thus, surprisingly, a limiting D-efficiency does not guarantee limiting G-efficiency, even though, for any given m, D-efficiency is equivalent to G-efficiency, by the Equivalence Theorem. On the contrary, asymptotic G-efficiency of a sequence of designs guarantees their asymptotic D-efficiency (see [24], pp. 8). 4

13 Kiefer and Studden [24] also showed that, for any ε > 0, a design ξ ε which doesn t depend on m, satisfying lim inf d m(ξm) m d m (ξ ε ) ε, (3) can be obtained by just a slight modification of the measure ξ c, namely, by adding a small additional mass near the end-points ±..2 The scope of the thesis..2. Extension of the concept of optimal design. The classical Theory of Optimal Design was developed under the tacit assumption that only the least squares estimates were needed. This leaves the open question of to what degree its optimality results hold when no such restrictive assumptions are made about the class of possible estimates. Let us assume that the error terms ε i are normally distributed independent random variables, ε i N (0, σ 2 ). In this thesis, the classical theory of optimal designs for the Gaussian observation model is generalized to arbitrary possible estimates under the squared losses (see Sec. 2.2). We have chosen to present complete proofs of all the relevant results in the theory of optimal designs and, where possible, indicate further simplifications of the classical results (see Ch. 2). In this context, we first discuss applications of the general theory of optimal designs to the polynomial regression (see Sec. 3.2)..2.2 The super Chebyshev design. As we already mentioned, in the polynomial regression of a given order m, there exists the unique optimal design ξm. However, ξm depends crucially on the order m; consequently, it can be used only when m is given a priori. Actually, the maximal risk can be very sensitive to the design. Even relatively small change of design increase the risk several times. From the modern perspective, m is hardly ever exactly known. Even though parametric regression undoubtedly remains the most prevalent approach to regression analysis, it requires very specific information about the degree of the polynomial regression function f(x). Thus parametric regression is appropriate only when there is adequate information about the degree of f(x). On the other hand, non-parametric regression generally only assumes that the regression curve belongs to some infinite dimensional collection of functions. Modern approaches to non-parametric regression require computation and a careful comparison of a collection of different estimates, often using high degree polynomial estimators of different degree. 5

14 This leads to the problem of finding such designs which would be nearly optimal for a larger set of possible values of the order m. To deal with this problem, we will show first that, in the case of the Chebyshev design ξ c which does not require the knowledge of m, the quadratic risk uniformly converges to the optimal risk value m on any closed interval [a, b] (, ), whereas it s corresponding maximal risk on [, ] is only twice bigger (cf. (2)) than that of the optimal design. Note that according to (3), there exists a design ξ ε which is almost optimal, for all sufficiently large m. Unfortunately, the proof of the existence of such designs is not constructive; cf. [24]. Therefore, in Sec. 3.4, we introduce the so-called the super Chebyshev design: ξ s = αξ c + ( α)ξ e, where α is a constant, ξ c is the Chebyshev design and ξ e is a measure concentrated at the end-points: ξ e ( ) = ξ e () = /2. Since the maximal risk of the Chebyshev design ξ c occurs at the end-points, it is natural to try to improve it by adding more weight to the end-points. It will be demonstrated that, by choosing appropriate α, asymptotically as m, ξ s reduces the maximal risk of the Chebyshev design roughly by 45%. In other words, ξ s is about 9% asymptotically efficient, with respect to the maximal risk, compared to the Chebyshev design which is only 50% efficient..2.3 The rational regression model. In Ch. 4, we venture into the intriguing area of the optimal designs for the rational regression model, about which little was known. Rational functions are a classical tool in Approximation Theory. In many cases, they proved to be a much more efficient tool for approximation than algebraic and trigonometric polynomials. In 964, D. Newman [32] showed that the function x can be uniformly approximated by rational functions much better than by polynomials. This result stimulated a great interest and substantial progress in the field of rational approximation. In this direction, we outline a general class of estimation procedures and characterize for them the optimal designs (see Sec. 4.2). We expect that models based on rational approximation may also be of a significant value in non-parametric regression. If in a rational regression model the set of poles {w i } m i= is given (see the definitions in Sec. 4.), there exists unique optimal design ξ (see Sec. 4.2). However, here again 6

15 ξ depends crucially on the set of poles {w i } m i=, consequently, it can be used only when this set is known a priori. As in polynomial regression, this leads to the problem of finding such designs which would be nearly optimal for a larger collection of possible pole sets {w i } m i=. To deal with this problem, we generalize the approach used in polynomial regression to the more general class of rational regression models. In Sec. 4.3, we discuss those properties of the Chebyshev design ξ c which do not require specific knowledge of the pole set {w j } m j=. In particular, we show that properties of ξ c, in the rational regression model, are similar to those in the polynomial regression. Further, in Sec. 4.4, theoretical and numerical evidence will be presented showing that, by choosing appropriate α, asymptotically as m, ξ s reduces the maximal risk of the Chebyshev design roughly by 45%..2.4 Cardinal spline smoothing. It is well known that when the degree of interpolating polynomial increases, the overall approximation error, in some cases, may tend to infinity. In the numerical analysis, this effect is known as Runge s phenomenon (see [4], pp. 02). The problem can be avoided by using spline curves which are piecewise polynomials. When trying to improve upon the approximation error, one can increase the number of polynomial pieces of which an interpolating spline is made up, instead of increasing the degree of the these polynomials. It is generally accepted that the first mathematical reference to splines is the 946 paper by Schoenberg [39], which is probably the first publication using the word spline in connection with smooth piecewise polynomial functions. Before computers were used, numerical calculations were done by hand. Although, of course, piecewise-defined functions, like the sign function, or step function, were well known, polynomials were generally preferred because they were easier to work with. Through the advent of computers splines have gained importance. They were first used as a replacement for polynomials in interpolation, then as a tool to construct smooth and flexible shapes in computer graphics. Splines are also broadly used in statistical applications requiring data smoothing. In this area, they successfully compete with yet another well known statistical tool, kernel smoothing. In Ch. 5, we introduce a special type of kernel based on the so-called fundamental splines of cardinal interpolation. These functions are well known in Approximation Theory, where there is a huge literature devoted to their numerous properties (see e.g. [37], [38] and [39]). While many other kinds of splines, such as B-splines and natural splines, have long since found their way into Nonparametric Statistics, cardinal splines, to the best of our knowledge, have been never discussed in this context. This 7

16 is all the more surprising since, as we will demonstrate in Ch. 5, they have excellent statistical properties and represent quite a natural and useful tool for nonparametric estimation. Asymptotic optimality of the resulting nonparametric spline estimators will be discussed in the context of functional classes F = F(C, γ) = {f C(R) : ˆf(t) } Ce γ t, where ˆf(t) is the Fourier transform of f(x) and the positive constants C, γ > 0, for the most part, will be considered given. These functional classes are well known in nonparametric estimation, see [2] and [29]. Related questions such as comparative properties of B- and cardinal splines, numerical bounds on the accuracy of these splines, as well as the well known generalized method of cross validation, will be also discussed in Ch. 5. 8

17 Chapter 2. Overview of the optimal design theory 2.. Introduction. In this chapter the classical theory of optimal designs for the Gaussian observations model ([9], [34]) is generalized to arbitrary possible estimates under the square losses (see Sec. 2.2). We present complete proofs of all relevant results in the theory of optimal designs, and indicate some possible simplifications, where possible The statistical model. Let F = (f (x), f 2 (x),..., f m (x)) denote a system of linearly independent continuous real valued functions defined on a compact set X R d. Consider the regression model y i = f(x i ) + ε i = m θ j f j (x i ) + ε i, i =,..., n. (4) j= Here y i are observations, f(x) is an unknown regression function which belongs to the linear subspace R F spanned by f j, θ j, j =, 2,..., m, are regression parameters, x i X given observation points, and ε i are independent normally distributed error terms, ε i N (0, σ 2 ). We are interested in estimators exhibiting good accuracy over the whole class R F of regression functions in (4). The collection of the observation points ξ n = {x i } used in model (4) will be called the design. Here x i are not necessarily distinct. The definition of the design ξ n will be generalized further in Sec designs will be denoted Ξ n. The set of all The classical concept of optimal designs. Let us start with some known results. First, we will write equation (4) in the form of a standard linear model. Denote y = (y i ) n, X = (X ij ) = (f j (x i )) n m, θ = (θ j ) m and ε = (ε i ) n. Then equation (4) can be written in the matrix form of a standard linear model, y = Xθ + ε. (5) 9

18 Let ˆf n (x) = h (x)ˆθ be the least squares estimator of f(x), corresponding to a given design ξ n, with the column-vector h(x) given by h(x) = (f (x), f 2 (x),..., f m (x)). It is well known that, if the matrix X X is non-singular, the least squares estimator of the vector θ is ˆθ = (X X) X y, and the mean squared risk of estimating f(x), at a given at x X, is E( ˆf n (x) f(x)) 2 = σ 2 h (x)(x X) h(x). Note that this mean squared risk does not depend on the underlying function f. Denote the maximal risk, corresponding to a given design ξ n, R(F, ξ n ) = max x X E( ˆf n (x) f(x)) 2 = max x X σ2 h (x)(x X) h(x). In the classical theory (see [9], [34]), the optimal design ξ o n is any design minimizing the maximal risk, i.e. a design ξ o n achieving R(F ) = min ξ n Ξ n R(F, ξ n ) = min ξ n Ξ n max x X E( ˆf n (x) f(x)) Extended concept of optimal designs. Now let us extend the above approach from least squares estimators to arbitrary estimators. Let f n (x) be an arbitrary estimator of the unknown function based on the set of observations y. To indicate the dependence of this estimator on the underlying design ξ n, we will use the notation f n ξ n. The accuracy of any estimator f n can be measured by R( f n, f) = max x X E( f n (x) f(x)) 2. (6) 0

19 Therefore, it is natural to look at the following minimax risk: r(f, ξ n ) = min f n ξ n max f R F max x X E( f n (x) f(x)) 2. Our final goal is to find an optimal design which minimizes this risk, i.e. a design ξ n achieving r(f ) = min ξ n Ξ n r(f, ξ n ) = min ξ n Ξ n min max max E( f n (x) f(x)) 2. (7) f n ξ n f R F x X 2.2. Criteria of Optimal Designs. Preceding our further discussion of Optimal Designs, we introduce some basic statistical tools related to linear models. Under our assumptions, the joint density of observations can be written as p(y, θ) = ) y Xθ 2 exp (. (2π) n/2 2σ Fisher information and maximum likelihood estimators. Consider the score function ( ) ln p(y, θ) l(y, θ) = θ. m The Fisher information matrix, related to the density p(y, θ), is defined as I(θ) = E θ l(y, θ)l (y, θ), where the index θ indicates that expectation is calculated under the density p(y, θ). An elementary calculation shows that, in the case of the normal density p(y, θ), the Fisher information matrix does not depend on θ and is given by I = σ 2 X X. To indicate dependence of this matrix on the design ξ n, it is convenient to introduce the notation X X = nm(ξ n ).

20 In the terminology commonly used in the Optimal Design Theory, M(ξ n ) is also refered to as information matrix. Note that M(ξ n ) is a m m square matrix, with entries M(ξ n ) kl = n n f k (x i )f l (x i ). (8) i= Let θ Θ R m be a vector of unknown parameters. Denote by ˆθ the Maximum Likelihood Estimator (MLE) of θ. Clearly, in the case of a linear model (5), ˆθ coincides with the Least Squares Estimator (LSE) of θ. It is well known that if the matrix X X is non-singular, the least squares estimator ˆθ = (X X) X y is normally distributed: N (θ, I ) Minimax estimation. One of the most compelling justifications of the use of LSE is that, in the case of normal linear regression models, such as (4) and (5), they can be shown to be minimax, simultaneously for a broad variety of loss functions. Although the related results are well known, there seem to be no easy references, nor sufficiently simple proofs, specifically oriented towards linear regression. In this section we demonstrate that ˆθ is a minimax estimator, for a variety of quadratic loss functions. In particular, if we might be interested in the squared risk function R 0 ( θ, θ) = E θ θ θ 2. (9) Note that in this case R 0 (ˆθ, θ) = σ 2 Tr(X X). (0) We might be also interested in estimating a linear functional of the form Ψ = a θ. Then, with the mean squared risk defined as R a ( Ψ, θ) = E θ ( Ψ Ψ) 2, () 2

21 the maximum likelihood estimate ˆΨ = a ˆθ satisfies R a ( ˆΨ, θ) = σ 2 a (X X) a. Theorem 2.. If Θ = R m, then the mle estimate ˆθ is minimax with respect to the risk functions (9) and (). Proof. Since the risk of the mle does not depend on θ, we can see from (0) that minimax risk satisfies r 0 := min max R 0( θ, θ) σ 2 Tr(X X). θ θ Θ Hence we need only to show that r 0 σ 2 Tr(X X). (2) Note that if Π is a prior distribution for θ such that Π(Θ) =, (3) then the corresponding Bayes risk of an estimate θ, R 0 ( θ, Π) = R 0 ( θ, θ) dπ(θ), Θ satisfies R 0 ( θ, Π) max θ Θ R 0( θ, θ). (4) Consider the corresponding Bayes risk R 0 (Π) = min R 0 ( θ, Π). θ It follows from (4) that, for any estimate θ, R 0 (Π) max θ Θ R 0( θ, θ). Hence, for any prior distribution Π satisfying (3), R 0 (Π) r 0. (5) 3

22 We will use this fact to demonstrate the validity of (2). For that, let us specify the prior distribution as Π = N ( 0, σ 2 κ 2 (X X) ), where the parameter κ > 0 will be chosen later. Under our assumptions, condition (3) holds trivially. Denote by π(θ) the density of Π and by ( f(y θ) exp ) y Xθ 2 2σ2 the density of observations y. Then treating y and θ as jointly random elements, the posterior density of θ is f(θ y) f(y θ)π(θ) = exp ( ( y Xθ 2 + κ 2 θ X Xθ ) ). 2σ 2 Simple algebra yields ( f(θ y) exp ( ( + κ 2 )θ X Xθ 2θ X y ) ) 2σ 2 exp ( + κ2 2σ 2 ( θ ) ( + κ 2 (X X) X y X X θ ) ) + κ 2 (X X) X y = exp + κ2 2σ 2 ( θ ˆθ ) ( X X θ ˆθ ). + κ 2 + κ 2 From Decision Theory, it is well known that the Bayes estimate θ, with respect to the mean squared risk (9), is the posterior mean θ = ˆθ + κ 2, and its Bayes risk R(π) = R 0 ( θ, Π), 4

23 coincides with the posterior risk R 0 (π) = σ 2 ( + κ 2 ) Tr(X X). Using (5) and letting κ 0 proves the inequality (9), and thus our theorem, in the case of the risk function (9). In the case of the risk function (), the corresponding Bayes estimate again coincides with the posterior mean Ψ = a ˆθ + κ 2, while both its posterior quadratic risk and Bayes risk are given by R a (π) = σ 2 ( + κ 2 ) a (X X) a. Our Theorem follows from this equation as before. Applying Theorem 2. to model (4), with ˆf n (x) = h (x)ˆθ, where h(x) = (f (x), f 2 (x),..., f m (x)), we obtain the following minimax property of the LSE ˆθ: for any x 0 X, min max E( f n (x 0 ) f(x 0 )) 2 f n ξ n f R F = max f R F E( ˆf n (x 0 ) f(x 0 )) 2 (6) = σ2 n h (x 0 )M (ξ n )h(x 0 ). 5

24 An application to Optimal Designs. This minimax property of the maximum likelihood estimate can be easily extended to the risk function R( f n, f) in (6). Using (6) we obtain r(f, ξ n ) = min f n ξ n max f R F max x X E( f n (x) f(x)) 2 min f n ξ n max f R F E( f n (x 0 ) f(x 0 )) 2 = max f R F E( ˆf n (x 0 ) f(x 0 )) 2 = σ2 n h (x 0 )M (ξ n )h(x 0 ). Therefore, r(f, ξ n ) σ2 n max x X h (x)m (ξ n )h(x). On the other hand, r(f, ξ n ) max f R F max x X E( ˆf n (x) f(x)) 2 = σ2 n max x X h (x)m (ξ n )h(x). It follows that the minimax risk (7) can be represented as r(f ) = σ2 n min max ξ n Ξ n x X h (x)m (ξ n )h(x). (7) This result shows that for the Gaussian observation model the optimal design ξn, o in the traditional sense commonly used in the existing theory of Optimal Designs, essentially coincides with a more rigorous approach which does not restrict the class of possible estimators exclusively to LSE An extension of the class of designs Ξ n. Obviously, any design ξ n Ξ n can be identified with the probability measure ξ n on X of the following special form ξ n (x i ) =, i =,..., n. n In terms of this measure, the matrix M(ξ n ) has the form, cf. (8), M(ξ n ) = h(x)h (x) dξ n (x). X 6

25 This allows us to extend the definition of the information matrix M(ξ n ) to an arbitrary probability measure ξ with support in X, supp(ξ) X, as M(ξ) = h(x)h (x) dξ(x). (8) X Denote by Ξ the collection of all such designs. This generalization is quite natural and, in fact, is technically easier to work with. In the classical concept of Optimal Designs ([9], [34]), this has led to the study of the following minimax risk min ξ Ξ max x X E( ˆf(x) f(x)) 2. Under our more general approach, this can be replaced by a more general risk, without any restrictions on the class of possible estimates f(x): min min max max ξ Ξ f ξ f R F x X E( f(x) f(x)) 2. Note. The introduction of an arbitrary design measure ξ may be a bit difficult to interpret in terms of practical applications. This can be helped, however, by Corollary 2.4 below, according to which a regression model with an arbitrary design measure can be replaced by an equivalent design with a discrete design measure ξ supported by at most j [(m + )m/2] + design points. Applying a sufficiency argument to normal data reduces any such discrete design model to an equivalent inhomogeneous regression model, with a finite set of observation points x i, i =,..., j, and varying variances of the error terms ξ(x i ). As a result, for an arbitrary design model (4), there exists an equivalent design model, with a finite support set x i, i =,..., j, for some j [(m + )m/2] +, having the following form y i = f(x i ) + m ξ(x i ) ε i = θ j f j (x i ) + ξ(x i ) ε i. j= This equation exhibits a direct relation between the design measure ξ(x i ) and the variance of errors terms. Obviously, in an inhomogeneous regression model, the latter can be an arbitrary number ξ(x i ) 0. 7

26 2.3. Some properties of the information matrix M(ξ). The above relation (7) makes it obvious that the information matrix M(ξ) defined in the general case by (8), plays a central role in determining the optimal designs ξn. o In this subsection, we will discuss some properties of M(ξ). In particular, these properties will be used below to prove the celebrated Equivalence Theorem and its ramifications. Denote by M the family of matrices M(ξ) defined by (8) for arbitrary design measures ξ Convex compactness of M. Recall that a subset S of a linear space is called convex, if for any s S, s 2 S, and 0 α, the point s = αs + ( α)s 2 again belongs to S. If S is an arbitrary set in a linear space, then the set S of points s = k α i s i, i= where k α i =, α i 0, s i S, (i =, 2,..., k; k =, 2,...), i= is called the convex hull of the set S. Note that if S is a compact set, then S is also a compact set. Denote by S the set of points where k s = α i s i, i= α i 0, s i S (i =, 2,..., k; k =, 2,...). S is called the convex cone of the set S. Let us demonstrate first that M constitutes a convex compact set. Recall that we consider regression model (4), for which a design measure ξ has its support in a given subset X R d. Lemma 2.2. Let X R d be a compact set. Then M is the convex hull of the matrices M(ξ[x]) corresponding to those designs ξ = ξ[x] whose support consists of a single point x X. Moreover, M is a convex compact set in the linear space of all m m matrices. 8

27 Proof. For reader s convenience, we present below a short proof of this lemma. For more details see [9], pp Let S = {M(ξ[x]) x X } and let S be the convex hull of the set S. Since the matrix M(ξ) is linear in ξ, and the set Ξ of all probability measures on X is convex, S M. Note that the entries of the matrix M(ξ[x]) have the form f i (x)f j (x), cf. (8). Since the functions f,..., f m are continuous, so is the matrix M(ξ[x]). Since X is compact, the set S is also compact, and therefore S is compact. Now, let us demonstrate that M = S. In fact, it only remains to show that M S. Since S is a closed convex set, there is a set L of linear functionals L( ) such that s S if and only if L(s) 0, for all L L. Thus we need to show that L(M(ξ)) 0, for any L L. Since any linear matrix functional L(M) is of the form L(M) = i,j a ij M ij, we have, for any L L, L ( M(ξ) ) = i,j a ij M ij (ξ) = i,j a ij f i (x)f j (x)dξ(x) = m i,j= a ij f i (x)f j (x)dξ(x) = L(M(ξ[x]))dξ(x) 0. The following classical result plays an important role in the Theory of Optimal Designs, cf. Corollary 2.4 below. Theorem 2.3. [Caratheodory s Theorem.] Let S be any subset of R l, l. Then for any point s in the convex hull S, there exist s i S, i =,..., l +, and a set of l+ α i 0, α i =, i= such that l+ s = α i s i. i= 9

28 Proof. For the reader s convenience we present below a proof of this. For more details, see e.g. [43]. Given a set S in R l, let S be the subset of R l+ which consists of all vectors of the form (, s) with s S, and let S be the convex cone generated by S. Let C be the set of vectors c in R l such that (, c) S. Then it is easily seen that C = S. Let s S so that s = λ s + + λ k s k where λ i > 0 and each s i S. If k > l +, then the vectors s i are linearly dependent. Thus there exist µ,..., µ k, not all zero, such that µ s + + µ k s k = 0. Since the first component of each vector s i is, we have µ + + µ k = 0, and so at least one of the coefficients µ i is positive. Let λ be the largest number such that λµ i λ i, i =,..., k. Obviously, λ is finite, since at least one of the µ i is positive. Denote λ i = λ i λµ i. Then λ s + + λ ks k = λ s... + λ k s k λ(µ s + + µ k s k) = s. Hence at least one of the coefficients λ i = 0. We have thus expressed s as a positive linear combination of fewer than k elements of S. This argument can clearly be repeated until s has been expressed as a positive linear combination at most l + elements of S, since more than l + elements are necessarily linearly dependent. In particular, all element s of the form (, c) can be expressed in this way. From this result it follows easily that the information matrix M(ξ) of an arbitrary design ξ coincides with the information matrix corresponding to some design with a finite support. Corollary 2.4. Let ξ Ξ be an arbitrary design. Then there exists a finite subset x,..., x j X and a set p,..., p j 0, j i= p i =, such that j [(m + )m/2] +, for which the matrix M(ξ) can be represented in the form M(ξ) = j p i h(x i )h (x i ). i= 20

29 Proof. This result follows directly from Lemma 2.2 and the Caratheodory theorem. By Corollary 2.4, in analyzing the set of all information matrices M(ξ), ξ Ξ, we may assume, without loss of generality, that the design measure ξ has a finite support The quadratic risk d(x, ξ). Recall that, for any design ξ Ξ, the minimax risk in (7) is determined by the m m matrix M(ξ) in (8) through the function h (x)m (ξ)h(x), where h(x) = (f (x), f 2 (x),..., f m (x)). In what follows, we will denote the determinant of a square matrix M by M. Since the information matrix M(ξ) may happen to be singular, we define the risk function, in general, as h (x)m (ξ)h(x), if M(ξ) 0, d(x, ξ) =, if M(ξ) = 0, Note that in the special case when the functions {f i (x), i =,..., m} are orthonormal with respect to the design ξ: f i (x)f j (x)dξ(x) = δ ij, the quadratic risk can be represented as X d(x, ξ) = m fi 2 (x). (9) i= By definition a design ξ is optimal if it minimizes max d(x, ξ). To carry further our x X discussion of optimal designs, we first introduce some properties of d(x, ξ). Lemma 2.5. Let ξ be an arbitrary design. If matrix M(ξ) is not singular, then d(x, ξ)dξ(x) = m, where m is the number of unknown parameters. X 2

30 Proof. For the reader s convenience, we present below a proof of this lemma. For more details, see e.g. [9], p. 69. Since h (x)m (ξ)h(x) is a real number, obviously d(x, ξ)dξ(x) = h (x)m (ξ)h(x)dξ(x) = Tr ( h (x)m (ξ)h(x) ) dξ(x). X X X Moreover, since Tr(AB) = Tr(BA) for any square matrices A and B, Tr ( h (x)m (ξ)h(x) ) = Tr ( M (ξ)h(x)h (x) ). Thus, by the definition of M(ξ), Tr ( h (x)m (ξ)h(x) ) ( ) dξ(x) = Tr M (ξ) h(x)h (x)dξ(x) X X = Tr ( M (ξ)m(ξ) ) = m. Corollary 2.6. For any design ξ, max d(x, ξ) m. x X Lemma 2.7. Let ξ be an arbitrary design. If the matrix M(ξ) is not singular, then max x d(x, ξ) = max Tr(M (ξ)m(θ)). θ Proof. Similar to the proof of Lemma 2.5, d(x, ξ) = Tr ( M (ξ)m(ξ[x]) ), where ξ[x] is a design whose support consists of a single point x X. So, trivially, On the other hand, Tr(M (ξ)m(θ)) = X max x d(x, ξ) max Tr(M (ξ)m(θ)). θ Tr ( M (ξ)m(ξ[x]) ) dθ(x) = X d(x, ξ)dθ(x) max d(x, ξ). x Thus, max x d(x, ξ) max Tr(M (ξ)m(θ)). θ Therefore, the lemma follows. 22

31 Strict concavity of log M. Here we introduce some properties of the function log M, M M, which will be used below to prove the celebrated Equivalence Theorem. Recall that M is the convex set of all information matrices M(ξ), ξ Ξ. Denote further M + = {M M M = 0} M and M ε = {M M all eigenvalues of M are ε} M. Lemma 2.8. M + and M ε are convex sets. Proof. A symmetric semi-definite matrix M is non-singular if and only if its smallest eigenvalue is positive: λ min = min v (v, Mv) > 0, where the minimum is taken over all normalized vectors v such that (v, v) = ; see e.g. [2], pp Let M, M 2 M +. Then obviously for any 0 α min v (v, (αm + ( α)m 2 )v) = min (α(v, M v) + ( α)(v, M 2 v)) v α min(v, M v) + ( α) min(v, M 2 v) > 0. v v Similarly, if M, M 2 M ε. min v (v, (αm + ( α)m 2 )v) ε. Remark. It follows from the above proof that the matrix αm + ( α)m 2 is non-singular even if only one of the matrices, say M 2, is non-singular, and 0 α <. Definition. Let S be a convex set. A numerical function f defined on S is called concave, if for any s, s 2 S, s s 2, and any 0 < α <, f(αs + ( α)s 2 ) αf(s ) + ( α)f(s 2 ); 23

32 convex, if f(αs + ( α)s 2 ) αf(s ) + ( α)f(s 2 ); strictly concave, respectively, convex, if the corresponding inequality is strict. First we show that the function log M defined on the convex set M +, is strictly concave. Lemma 2.9. The function log M, M M +, is strictly concave. Proof. For readers convenience, we present below a simplified proof of this lemma. For more details, see [9], p. 7. By Lemma 2.8, M + is a convex set. Therefore, it is sufficient to show that for any M, M 2 M + and 0 < α <, log M > α log M + ( α) log M 2. where M = αm + ( α)m 2. Since M and M 2 are both symmetric positive-definite, there exists a non-singular matrix A such that AM A = I and AM 2 A = B, where I is the identity matrix and B = diag(b,..., b m ), b i > 0, is a diagonal matrix (see [8], p. 466). Thus M = αm + ( α)m 2 = αa (A ) + ( α)a B(A ) m = A (αi + ( α)b)(a ) = A 2 (α + ( α)b i ). (20) i= On the other hand, M α M 2 α = A (A ) α A B(A ) α = A 2 m i= b α i. (2) Using the classical Young inequality in the form xy αx α + ( α)y α, we obtain from (20) and (2) M > M α M 2 α. 24

33 Therefore, log M > α log M + ( α) log M 2. Let A = A(t), t R, be a smooth family of symmetric positive definite matrices. Denote Ȧ = da. The following result shows how to differentiate the determinant A. dt Lemma 2.0. d dt log A(t) = Tr(A Ȧ). Proof. Note that, together with A, the matrix A is symmetric positive definite. Consider the integral of the density of a multivariate normal distribution with the vector of means 0 and covariance matrix V = A : ( ) Rm A /2 exp x Ax dx =. (2π) m/2 2 Differentiating under the integral sign which is easy to justify in this case gives Rm A /2 (2π) m/2 exp 2 ( ) ( x Ax A 2 ( d log A Tr(V A) dt ) 2 A x Ȧx dx = 2 ) = 0. From the above lemma we can get the following result which will be essential below in proving the Equivalence Theorem. Lemma 2.. Let there be two design measures ξ and ξ 2 with information matrices, respectively, M(ξ ) and M(ξ 2 ) such that M(ξ 2 ) M +. Denote ξ = αξ + ( α)ξ 2, 0 α <. Then and d dα log M(ξ) = Tr ( M (ξ) (M(ξ ) M(ξ 2 )) ), d dα log M(ξ) α=0 = d(x, ξ 2 )dξ (x) m. X 25

34 Proof. For the reader s convenience, we present below a proof of this lemma. For more details, see e.g. [9], p. 7. Note that M(ξ) = h(x)h (x)dξ(x) = α X X h(x)h (x)dξ (x) + ( α) h(x)h (x)dξ 2 (x) X = αm(ξ ) + ( α)m(ξ 2 ) and, by Remark, M(ξ) is non-singular. Using Lemma 2.0 we obtain ( d log M(ξ) = Tr M (ξ) d ) dα dα M(ξ) = Tr ( M (ξ) (M(ξ ) M(ξ 2 )) ). Moreover, since M(ξ) = M(ξ 2 ) for α = 0, we have d dα log M(ξ) α=0 = Tr ( M (ξ 2 ) (M(ξ ) M(ξ 2 )) ) = Tr ( M (ξ 2 )M(ξ ) ) Tr ( M (ξ 2 )M(ξ 2 ) ) = Tr ( M (ξ 2 )M(ξ ) ) m = d(x, ξ 2 )dξ (x) m. X Strict convexity of M. Let M and M 2 be two m m matrices. We say that M > M 2 (M M 2 ) if the matrix M M 2 is positive (semi-) definite. Note that this partial order is preserved under pre- and post-multiplication by any non-singular m m matrix (respectively, any matrix) A. In this subsection we will show that the matrix M is strictly convex, in the sense of the above partial ordering, on the set M +. Lemma 2.2. Let M and M 2 be symmetric positive-definite matrices. Then a) for any 0 < α <, αm + ( α)m 2 (αm + ( α)m 2 ), 26

35 with equality if and only if M = M 2 ; b) for any m m matrix A and any 0 < α < α Tr(M A) + ( α) Tr(M 2 A) Tr((αM + ( α)m 2 ) A), with strict inequality if A is non-singular. Proof. a) Since M and M 2 are positive definite, there exists a non-singular matrix A such that AM A = I and AM 2 A = B, where I is the identity matrix and B = diag(b,..., b m ), b i > 0, is a diagonal matrix (see [8], p. 466). Then αm + ( α)m2 = A diag ( ) α + ( α)b i A, (22) and (αm + ( α)m 2 ) = A diag (α + ( α)b i ) A. (23) Note that (α + ( α)b i )(α + ( α)b i ) = α 2 + (b i + b i )α( α) + ( α) 2 α 2 + 2α( α) + ( α) 2 =, or α + ( α)b i α + ( α)b i, (24) with equality if and only if b i =. Therefore, by (22), (23) and (24), for any 0 < α <, αm + ( α)m 2 (αm + ( α)m 2 ), (25) with equality if and only if M = M 2. b) Pre- and post-multiplying (25) by A /2 shows that the matrix αa /2 M A /2 + A /2 ( α)m 2 A /2 A /2 (αm + ( α)m 2 ) A /2 27

36 is positive semi-definite (positive definite, if A is non-singular). Since Tr is a linear operator and Tr(AB) = Tr(BA), α Tr(M A) + ( α) Tr(M 2 A) Tr((αM + ( α)m 2 ) A) 0, with strict inequality if A is non-singular. Lemma 2.3. Let M be a symmetric m m positive-definite matrix. Then m Tr M M /m. Moreover, the equality holds if and only if M is a multiple of the identity. Proof. Denote by λ i the eigenvalues of M. Since Tr M = m i= λ i and M = m i= λ i, the result follows from the arithmetic-geometric means inequality. Corollary 2.4. Let there be two design measures ξ and ξ 2 with information matrices, respectively, M(ξ ) and M(ξ 2 ) such that M(ξ 2 ), M(ξ 2 ) M +. Then Tr(M (ξ )M(ξ 2 )) m M(ξ 2) /m M(ξ ) /m and equality occurs if and only if M(ξ 2 ) is proportional to M(ξ ) The equivalence theorem. The purpose of this subsection is to provide a proof of the following theorem due to Kiefer and Wolfowitz [26] which plays an important role in the Theory of Optimal Designs. Theorem 2.5. [Equivalence Theorem] The following assertions: () the design ξ maximizes M(ξ), (2) the design ξ minimizes max d(x, ξ), x (3) max d(x, ξ ) = m, x are equivalent. 28

37 Proof. For the reader s convenience, we present below a proof of this theorem. For more details, see [26]. () (2) and (3). Since the set of M(ξ) is compact and the determinant is a continuous function, there exists ξ which maximizes M(ξ). Let ξ be such a design. Consider another design ξ, and let ξ : ξ = αξ + ( α)ξ. Since M(ξ ) M(ξ), by Lemma 2. we have d log M( ξ) = dα α=0 X d(x, ξ )dξ(x) m 0. Assuming that the design ξ concentrates its mass at one point x X, we obtain d(x, ξ )dξ(x) m = d(x, ξ ) m 0. X This means that max d(x, ξ ) m. x On the other hand, by Lemma 2.5, for any design ξ, max d(x, ξ) m. x Therefore, ξ minimizes max x d(x, ξ) and max d(x, ξ ) = m. x (2) (). Let a design ξ minimize max d(x, ξ). In the previous step of the proof x d(x, ξ ) = m. It follows that for we have shown that min ξ any other design ξ, max x d(x, ξ) = m. Thus max x X d(x, ξ )dξ(x) m 0. Assume that ξ does not maximize M(ξ). Since, by Lemma 2.9, log M(ξ) is a strictly concave function, there is another design ξ such that log M( ξ) is strictly increasing, for all sufficiently small α, where ξ = αξ + ( α)ξ. Thus d log M( ξ) = d(x, ξ )dξ(x) m > 0. dα α=0 X We arrived at a contradiction. (3) (2). The result follows trivially from Corollary

38 The above result is somewhat surprising. It shows, in particular, that for any compact set X and for any system of continuous linearly independent functions f i (x), i =,..., m, defined on X, there always exists an optimal design ξ, and the minimax risk depends neither on the set X nor on the functions f i themselves, but exclusively on their number! On the other hand, finding an optimal design ξ is often quite a formidable task. In many cases, solving this problem can be helped by the following result. Corollary 2.6. For an optimal design ξ, ξ {x d(x, ξ ) = m} =. Proof. Let A = {x d(x, ξ ) < m}. Suppose ξ (A) > 0. Then, in view of Part iii) of Theorem 2.5, d(x, ξ )dξ (x) = d(x, ξ )dξ (x) + d(x, ξ )dξ (x) X A X \A < mdξ (x) + mdξ (x) A X \A = mdξ (x) = m. X However, by Lemma 2.5, X d(x i, ξ )dξ (x) = m. This contradiction proves our result. We will use the above Corollary to prove a result which will be used later for proving uniqueness of the optimal design ξ in the cases of polynomial and rational regression. We call x an a-point of a given function f, if f(x) = a. The number of a-points of f, in a given interval, will be counted according to their multiplicities. Lemma 2.7. Let X = [, ]. Assume that functions f,..., f m are continuously differentiable on (, ), and let ξ be an optimal design. If the number of m-points of d(x, ξ ) in the closed interval [, ] does not exceed 2m 2, then ξ concentrates 30

39 its mass at exactly m points in the interval [, ], two of which coincide with the end-points. Proof. Obviously, the support of ξ must contain at least m points, since otherwise any m functions f,..., f m are linearly dependent ξ -a.s., the matrix M(ξ ) is singular, and M(ξ ) = 0. Thus the support of ξ must contain at least m 2 points in the open interval (, ). There are only three alternatives: the support of ξ contains () at least m 2 points in (, ) plus the end-points ±; (2) at least m points lie in (, ) and one of the end-points; (3) at least m points in (, ). Note that by Corollary 2.6, all points in the support of ξ are m-points of the risk function d(x, ξ ). By Theorem 2.5, iii) they are also maximum points of d(x, ξ ), i.e. m-points whose multiplicities at least 2. This shows that in case (), support of ξ must contain exactly m 2 points in the open interval (, ). In case (2), the number of m-points of d(x, ξ ) would be at least 2(m ) +, contradicting the assumption. In case (3), the number of m-points of d(x, ξ ) is at least 2m, which again is impossible A stronger version of the equivalence theorem. In view of the importance of the Equivalence Theorem, Theorem 2.5, we prove below a stronger version of this theorem using ideas of Game Theory. This approach was first introduced in [20]. In general, a game with two Players, I and II, is defined by a pay-off function K = K(ξ, θ), here interpreted as the gain of Player II when Player I chooses a strategy ξ and Player II chooses independently a strategy θ. Denote by Ξ the set of all strategies available to Player I and by Θ the set of all strategies for Player II. Lemma 2.8 (The Main Theorem of Game Theory; see e.g. [20], pp. 4-6). Let Ξ and Θ be compact convex sets and let the pay-off function K = K(ξ, θ) be continuous on Ξ Θ, concave in θ for any given ξ Ξ, and convex in ξ for any given θ Θ. Then max min θ Θ ξ Ξ K(ξ, θ) = min max ξ Ξ θ Θ K(ξ, θ) := v. Furthermore, for both players there exist optimal strategies ξ o Ξ and θ o Θ, such that K(ξ o, θ) v, for all θ Θ, 3

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,

More information

ON D-OPTIMAL DESIGNS FOR ESTIMATING SLOPE

ON D-OPTIMAL DESIGNS FOR ESTIMATING SLOPE Sankhyā : The Indian Journal of Statistics 999, Volume 6, Series B, Pt. 3, pp. 488 495 ON D-OPTIMAL DESIGNS FOR ESTIMATING SLOPE By S. HUDA and A.A. AL-SHIHA King Saud University, Riyadh, Saudi Arabia

More information

Scientific Computing

Scientific Computing 2301678 Scientific Computing Chapter 2 Interpolation and Approximation Paisan Nakmahachalasint Paisan.N@chula.ac.th Chapter 2 Interpolation and Approximation p. 1/66 Contents 1. Polynomial interpolation

More information

Convex Functions and Optimization

Convex Functions and Optimization Chapter 5 Convex Functions and Optimization 5.1 Convex Functions Our next topic is that of convex functions. Again, we will concentrate on the context of a map f : R n R although the situation can be generalized

More information

Bayesian Nonparametric Point Estimation Under a Conjugate Prior

Bayesian Nonparametric Point Estimation Under a Conjugate Prior University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 5-15-2002 Bayesian Nonparametric Point Estimation Under a Conjugate Prior Xuefeng Li University of Pennsylvania Linda

More information

Function Approximation

Function Approximation 1 Function Approximation This is page i Printer: Opaque this 1.1 Introduction In this chapter we discuss approximating functional forms. Both in econometric and in numerical problems, the need for an approximating

More information

Scattered Data Interpolation with Polynomial Precision and Conditionally Positive Definite Functions

Scattered Data Interpolation with Polynomial Precision and Conditionally Positive Definite Functions Chapter 3 Scattered Data Interpolation with Polynomial Precision and Conditionally Positive Definite Functions 3.1 Scattered Data Interpolation with Polynomial Precision Sometimes the assumption on the

More information

Stochastic Design Criteria in Linear Models

Stochastic Design Criteria in Linear Models AUSTRIAN JOURNAL OF STATISTICS Volume 34 (2005), Number 2, 211 223 Stochastic Design Criteria in Linear Models Alexander Zaigraev N. Copernicus University, Toruń, Poland Abstract: Within the framework

More information

Singular Value Decomposition

Singular Value Decomposition Chapter 6 Singular Value Decomposition In Chapter 5, we derived a number of algorithms for computing the eigenvalues and eigenvectors of matrices A R n n. Having developed this machinery, we complete our

More information

Regression: Lecture 2

Regression: Lecture 2 Regression: Lecture 2 Niels Richard Hansen April 26, 2012 Contents 1 Linear regression and least squares estimation 1 1.1 Distributional results................................ 3 2 Non-linear effects and

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

12 - Nonparametric Density Estimation

12 - Nonparametric Density Estimation ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6

More information

Quadratic reciprocity (after Weil) 1. Standard set-up and Poisson summation

Quadratic reciprocity (after Weil) 1. Standard set-up and Poisson summation (December 19, 010 Quadratic reciprocity (after Weil Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ I show that over global fields k (characteristic not the quadratic norm residue symbol

More information

The properties of L p -GMM estimators

The properties of L p -GMM estimators The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion

More information

Chapter 4: Interpolation and Approximation. October 28, 2005

Chapter 4: Interpolation and Approximation. October 28, 2005 Chapter 4: Interpolation and Approximation October 28, 2005 Outline 1 2.4 Linear Interpolation 2 4.1 Lagrange Interpolation 3 4.2 Newton Interpolation and Divided Differences 4 4.3 Interpolation Error

More information

EE731 Lecture Notes: Matrix Computations for Signal Processing

EE731 Lecture Notes: Matrix Computations for Signal Processing EE731 Lecture Notes: Matrix Computations for Signal Processing James P. Reilly c Department of Electrical and Computer Engineering McMaster University September 22, 2005 0 Preface This collection of ten

More information

MATH 205C: STATIONARY PHASE LEMMA

MATH 205C: STATIONARY PHASE LEMMA MATH 205C: STATIONARY PHASE LEMMA For ω, consider an integral of the form I(ω) = e iωf(x) u(x) dx, where u Cc (R n ) complex valued, with support in a compact set K, and f C (R n ) real valued. Thus, I(ω)

More information

LECTURE NOTES ELEMENTARY NUMERICAL METHODS. Eusebius Doedel

LECTURE NOTES ELEMENTARY NUMERICAL METHODS. Eusebius Doedel LECTURE NOTES on ELEMENTARY NUMERICAL METHODS Eusebius Doedel TABLE OF CONTENTS Vector and Matrix Norms 1 Banach Lemma 20 The Numerical Solution of Linear Systems 25 Gauss Elimination 25 Operation Count

More information

Optimal experimental design, an introduction, Jesús López Fidalgo

Optimal experimental design, an introduction, Jesús López Fidalgo Optimal experimental design, an introduction Jesus.LopezFidalgo@uclm.es University of Castilla-La Mancha Department of Mathematics Institute of Applied Mathematics to Science and Engineering Books (just

More information

David Hilbert was old and partly deaf in the nineteen thirties. Yet being a diligent

David Hilbert was old and partly deaf in the nineteen thirties. Yet being a diligent Chapter 5 ddddd dddddd dddddddd ddddddd dddddddd ddddddd Hilbert Space The Euclidean norm is special among all norms defined in R n for being induced by the Euclidean inner product (the dot product). A

More information

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) Contents 1 Vector Spaces 1 1.1 The Formal Denition of a Vector Space.................................. 1 1.2 Subspaces...................................................

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

STAT 730 Chapter 4: Estimation

STAT 730 Chapter 4: Estimation STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2. APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product

More information

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,

More information

We consider the problem of finding a polynomial that interpolates a given set of values:

We consider the problem of finding a polynomial that interpolates a given set of values: Chapter 5 Interpolation 5. Polynomial Interpolation We consider the problem of finding a polynomial that interpolates a given set of values: x x 0 x... x n y y 0 y... y n where the x i are all distinct.

More information

Elementary linear algebra

Elementary linear algebra Chapter 1 Elementary linear algebra 1.1 Vector spaces Vector spaces owe their importance to the fact that so many models arising in the solutions of specific problems turn out to be vector spaces. The

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

REAL AND COMPLEX ANALYSIS

REAL AND COMPLEX ANALYSIS REAL AND COMPLE ANALYSIS Third Edition Walter Rudin Professor of Mathematics University of Wisconsin, Madison Version 1.1 No rights reserved. Any part of this work can be reproduced or transmitted in any

More information

Assignment 1: From the Definition of Convexity to Helley Theorem

Assignment 1: From the Definition of Convexity to Helley Theorem Assignment 1: From the Definition of Convexity to Helley Theorem Exercise 1 Mark in the following list the sets which are convex: 1. {x R 2 : x 1 + i 2 x 2 1, i = 1,..., 10} 2. {x R 2 : x 2 1 + 2ix 1x

More information

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical

More information

Lecture Notes in Mathematics. Arkansas Tech University Department of Mathematics. The Basics of Linear Algebra

Lecture Notes in Mathematics. Arkansas Tech University Department of Mathematics. The Basics of Linear Algebra Lecture Notes in Mathematics Arkansas Tech University Department of Mathematics The Basics of Linear Algebra Marcel B. Finan c All Rights Reserved Last Updated November 30, 2015 2 Preface Linear algebra

More information

Quadratic reciprocity (after Weil) 1. Standard set-up and Poisson summation

Quadratic reciprocity (after Weil) 1. Standard set-up and Poisson summation (September 17, 010) Quadratic reciprocity (after Weil) Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ I show that over global fields (characteristic not ) the quadratic norm residue

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

CHAPTER 3 Further properties of splines and B-splines

CHAPTER 3 Further properties of splines and B-splines CHAPTER 3 Further properties of splines and B-splines In Chapter 2 we established some of the most elementary properties of B-splines. In this chapter our focus is on the question What kind of functions

More information

Lecture 3: Statistical Decision Theory (Part II)

Lecture 3: Statistical Decision Theory (Part II) Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical

More information

Optimal designs for estimating the slope of a regression

Optimal designs for estimating the slope of a regression Optimal designs for estimating the slope of a regression Holger Dette Ruhr-Universität Bochum Fakultät für Mathematik 4480 Bochum, Germany e-mail: holger.dette@rub.de Viatcheslav B. Melas St. Petersburg

More information

Spatial Process Estimates as Smoothers: A Review

Spatial Process Estimates as Smoothers: A Review Spatial Process Estimates as Smoothers: A Review Soutir Bandyopadhyay 1 Basic Model The observational model considered here has the form Y i = f(x i ) + ɛ i, for 1 i n. (1.1) where Y i is the observed

More information

Optimization Theory. A Concise Introduction. Jiongmin Yong

Optimization Theory. A Concise Introduction. Jiongmin Yong October 11, 017 16:5 ws-book9x6 Book Title Optimization Theory 017-08-Lecture Notes page 1 1 Optimization Theory A Concise Introduction Jiongmin Yong Optimization Theory 017-08-Lecture Notes page Optimization

More information

Checking Consistency. Chapter Introduction Support of a Consistent Family

Checking Consistency. Chapter Introduction Support of a Consistent Family Chapter 11 Checking Consistency 11.1 Introduction The conditions which define a consistent family of histories were stated in Ch. 10. The sample space must consist of a collection of mutually orthogonal

More information

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space. Chapter 1 Preliminaries The purpose of this chapter is to provide some basic background information. Linear Space Hilbert Space Basic Principles 1 2 Preliminaries Linear Space The notion of linear space

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 7 Interpolation Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted

More information

ELEMENTARY LINEAR ALGEBRA

ELEMENTARY LINEAR ALGEBRA ELEMENTARY LINEAR ALGEBRA K R MATTHEWS DEPARTMENT OF MATHEMATICS UNIVERSITY OF QUEENSLAND First Printing, 99 Chapter LINEAR EQUATIONS Introduction to linear equations A linear equation in n unknowns x,

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

2. Intersection Multiplicities

2. Intersection Multiplicities 2. Intersection Multiplicities 11 2. Intersection Multiplicities Let us start our study of curves by introducing the concept of intersection multiplicity, which will be central throughout these notes.

More information

4.6 Bases and Dimension

4.6 Bases and Dimension 46 Bases and Dimension 281 40 (a) Show that {1,x,x 2,x 3 } is linearly independent on every interval (b) If f k (x) = x k for k = 0, 1,,n, show that {f 0,f 1,,f n } is linearly independent on every interval

More information

Chapter 7. Extremal Problems. 7.1 Extrema and Local Extrema

Chapter 7. Extremal Problems. 7.1 Extrema and Local Extrema Chapter 7 Extremal Problems No matter in theoretical context or in applications many problems can be formulated as problems of finding the maximum or minimum of a function. Whenever this is the case, advanced

More information

A Do It Yourself Guide to Linear Algebra

A Do It Yourself Guide to Linear Algebra A Do It Yourself Guide to Linear Algebra Lecture Notes based on REUs, 2001-2010 Instructor: László Babai Notes compiled by Howard Liu 6-30-2010 1 Vector Spaces 1.1 Basics Definition 1.1.1. A vector space

More information

On John type ellipsoids

On John type ellipsoids On John type ellipsoids B. Klartag Tel Aviv University Abstract Given an arbitrary convex symmetric body K R n, we construct a natural and non-trivial continuous map u K which associates ellipsoids to

More information

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces. Math 350 Fall 2011 Notes about inner product spaces In this notes we state and prove some important properties of inner product spaces. First, recall the dot product on R n : if x, y R n, say x = (x 1,...,

More information

A tailor made nonparametric density estimate

A tailor made nonparametric density estimate A tailor made nonparametric density estimate Daniel Carando 1, Ricardo Fraiman 2 and Pablo Groisman 1 1 Universidad de Buenos Aires 2 Universidad de San Andrés School and Workshop on Probability Theory

More information

2.2. Show that U 0 is a vector space. For each α 0 in F, show by example that U α does not satisfy closure.

2.2. Show that U 0 is a vector space. For each α 0 in F, show by example that U α does not satisfy closure. Hints for Exercises 1.3. This diagram says that f α = β g. I will prove f injective g injective. You should show g injective f injective. Assume f is injective. Now suppose g(x) = g(y) for some x, y A.

More information

VARIETIES WITHOUT EXTRA AUTOMORPHISMS I: CURVES BJORN POONEN

VARIETIES WITHOUT EXTRA AUTOMORPHISMS I: CURVES BJORN POONEN VARIETIES WITHOUT EXTRA AUTOMORPHISMS I: CURVES BJORN POONEN Abstract. For any field k and integer g 3, we exhibit a curve X over k of genus g such that X has no non-trivial automorphisms over k. 1. Statement

More information

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability... Functional Analysis Franck Sueur 2018-2019 Contents 1 Metric spaces 1 1.1 Definitions........................................ 1 1.2 Completeness...................................... 3 1.3 Compactness......................................

More information

CS 450 Numerical Analysis. Chapter 8: Numerical Integration and Differentiation

CS 450 Numerical Analysis. Chapter 8: Numerical Integration and Differentiation Lecture slides based on the textbook Scientific Computing: An Introductory Survey by Michael T. Heath, copyright c 2018 by the Society for Industrial and Applied Mathematics. http://www.siam.org/books/cl80

More information

x x2 2 + x3 3 x4 3. Use the divided-difference method to find a polynomial of least degree that fits the values shown: (b)

x x2 2 + x3 3 x4 3. Use the divided-difference method to find a polynomial of least degree that fits the values shown: (b) Numerical Methods - PROBLEMS. The Taylor series, about the origin, for log( + x) is x x2 2 + x3 3 x4 4 + Find an upper bound on the magnitude of the truncation error on the interval x.5 when log( + x)

More information

0. Introduction 1 0. INTRODUCTION

0. Introduction 1 0. INTRODUCTION 0. Introduction 1 0. INTRODUCTION In a very rough sketch we explain what algebraic geometry is about and what it can be used for. We stress the many correlations with other fields of research, such as

More information

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b) LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces 9.520: Statistical Learning Theory and Applications February 10th, 2010 Reproducing Kernel Hilbert Spaces Lecturer: Lorenzo Rosasco Scribe: Greg Durrett 1 Introduction In the previous two lectures, we

More information

Math 3108: Linear Algebra

Math 3108: Linear Algebra Math 3108: Linear Algebra Instructor: Jason Murphy Department of Mathematics and Statistics Missouri University of Science and Technology 1 / 323 Contents. Chapter 1. Slides 3 70 Chapter 2. Slides 71 118

More information

Some Background Material

Some Background Material Chapter 1 Some Background Material In the first chapter, we present a quick review of elementary - but important - material as a way of dipping our toes in the water. This chapter also introduces important

More information

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88 Math Camp 2010 Lecture 4: Linear Algebra Xiao Yu Wang MIT Aug 2010 Xiao Yu Wang (MIT) Math Camp 2010 08/10 1 / 88 Linear Algebra Game Plan Vector Spaces Linear Transformations and Matrices Determinant

More information

Optimal discrimination designs

Optimal discrimination designs Optimal discrimination designs Holger Dette Ruhr-Universität Bochum Fakultät für Mathematik 44780 Bochum, Germany e-mail: holger.dette@ruhr-uni-bochum.de Stefanie Titoff Ruhr-Universität Bochum Fakultät

More information

Min-Rank Conjecture for Log-Depth Circuits

Min-Rank Conjecture for Log-Depth Circuits Min-Rank Conjecture for Log-Depth Circuits Stasys Jukna a,,1, Georg Schnitger b,1 a Institute of Mathematics and Computer Science, Akademijos 4, LT-80663 Vilnius, Lithuania b University of Frankfurt, Institut

More information

n 1 f n 1 c 1 n+1 = c 1 n $ c 1 n 1. After taking logs, this becomes

n 1 f n 1 c 1 n+1 = c 1 n $ c 1 n 1. After taking logs, this becomes Root finding: 1 a The points {x n+1, }, {x n, f n }, {x n 1, f n 1 } should be co-linear Say they lie on the line x + y = This gives the relations x n+1 + = x n +f n = x n 1 +f n 1 = Eliminating α and

More information

ON THE CHEBYSHEV POLYNOMIALS. Contents. 2. A Result on Linear Functionals on P n 4 Acknowledgments 7 References 7

ON THE CHEBYSHEV POLYNOMIALS. Contents. 2. A Result on Linear Functionals on P n 4 Acknowledgments 7 References 7 ON THE CHEBYSHEV POLYNOMIALS JOSEPH DICAPUA Abstract. This paper is a short exposition of several magnificent properties of the Chebyshev polynomials. The author illustrates how the Chebyshev polynomials

More information

MTH Linear Algebra. Study Guide. Dr. Tony Yee Department of Mathematics and Information Technology The Hong Kong Institute of Education

MTH Linear Algebra. Study Guide. Dr. Tony Yee Department of Mathematics and Information Technology The Hong Kong Institute of Education MTH 3 Linear Algebra Study Guide Dr. Tony Yee Department of Mathematics and Information Technology The Hong Kong Institute of Education June 3, ii Contents Table of Contents iii Matrix Algebra. Real Life

More information

Vectors in Function Spaces

Vectors in Function Spaces Jim Lambers MAT 66 Spring Semester 15-16 Lecture 18 Notes These notes correspond to Section 6.3 in the text. Vectors in Function Spaces We begin with some necessary terminology. A vector space V, also

More information

September Math Course: First Order Derivative

September Math Course: First Order Derivative September Math Course: First Order Derivative Arina Nikandrova Functions Function y = f (x), where x is either be a scalar or a vector of several variables (x,..., x n ), can be thought of as a rule which

More information

Linear Methods for Prediction

Linear Methods for Prediction This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Semidefinite Programming

Semidefinite Programming Semidefinite Programming Notes by Bernd Sturmfels for the lecture on June 26, 208, in the IMPRS Ringvorlesung Introduction to Nonlinear Algebra The transition from linear algebra to nonlinear algebra has

More information

GEOMETRIC CONSTRUCTIONS AND ALGEBRAIC FIELD EXTENSIONS

GEOMETRIC CONSTRUCTIONS AND ALGEBRAIC FIELD EXTENSIONS GEOMETRIC CONSTRUCTIONS AND ALGEBRAIC FIELD EXTENSIONS JENNY WANG Abstract. In this paper, we study field extensions obtained by polynomial rings and maximal ideals in order to determine whether solutions

More information

Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall Nov 2 Dec 2016

Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall Nov 2 Dec 2016 Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall 206 2 Nov 2 Dec 206 Let D be a convex subset of R n. A function f : D R is convex if it satisfies f(tx + ( t)y) tf(x)

More information

GAUSS CIRCLE PROBLEM

GAUSS CIRCLE PROBLEM GAUSS CIRCLE PROBLEM 1. Gauss circle problem We begin with a very classical problem: how many lattice points lie on or inside the circle centered at the origin and with radius r? (In keeping with the classical

More information

Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games

Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games Alberto Bressan ) and Khai T. Nguyen ) *) Department of Mathematics, Penn State University **) Department of Mathematics,

More information

TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM

TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM H. E. Krogstad, IMF, Spring 2012 Karush-Kuhn-Tucker (KKT) Theorem is the most central theorem in constrained optimization, and since the proof is scattered

More information

Z-estimators (generalized method of moments)

Z-estimators (generalized method of moments) Z-estimators (generalized method of moments) Consider the estimation of an unknown parameter θ in a set, based on data x = (x,...,x n ) R n. Each function h(x, ) on defines a Z-estimator θ n = θ n (x,...,x

More information

A NICE PROOF OF FARKAS LEMMA

A NICE PROOF OF FARKAS LEMMA A NICE PROOF OF FARKAS LEMMA DANIEL VICTOR TAUSK Abstract. The goal of this short note is to present a nice proof of Farkas Lemma which states that if C is the convex cone spanned by a finite set and if

More information

Vector Spaces. Vector space, ν, over the field of complex numbers, C, is a set of elements a, b,..., satisfying the following axioms.

Vector Spaces. Vector space, ν, over the field of complex numbers, C, is a set of elements a, b,..., satisfying the following axioms. Vector Spaces Vector space, ν, over the field of complex numbers, C, is a set of elements a, b,..., satisfying the following axioms. For each two vectors a, b ν there exists a summation procedure: a +

More information

Coercive polynomials and their Newton polytopes

Coercive polynomials and their Newton polytopes Coercive polynomials and their Newton polytopes Tomáš Bajbar Oliver Stein # August 1, 2014 Abstract Many interesting properties of polynomials are closely related to the geometry of their Newton polytopes.

More information

Taylor and Laurent Series

Taylor and Laurent Series Chapter 4 Taylor and Laurent Series 4.. Taylor Series 4... Taylor Series for Holomorphic Functions. In Real Analysis, the Taylor series of a given function f : R R is given by: f (x + f (x (x x + f (x

More information

NOTES ON CALCULUS OF VARIATIONS. September 13, 2012

NOTES ON CALCULUS OF VARIATIONS. September 13, 2012 NOTES ON CALCULUS OF VARIATIONS JON JOHNSEN September 13, 212 1. The basic problem In Calculus of Variations one is given a fixed C 2 -function F (t, x, u), where F is defined for t [, t 1 ] and x, u R,

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Laurenz Wiskott Institute for Theoretical Biology Humboldt-University Berlin Invalidenstraße 43 D-10115 Berlin, Germany 11 March 2004 1 Intuition Problem Statement Experimental

More information

Definition 5.1. A vector field v on a manifold M is map M T M such that for all x M, v(x) T x M.

Definition 5.1. A vector field v on a manifold M is map M T M such that for all x M, v(x) T x M. 5 Vector fields Last updated: March 12, 2012. 5.1 Definition and general properties We first need to define what a vector field is. Definition 5.1. A vector field v on a manifold M is map M T M such that

More information

CONCEPT OF DENSITY FOR FUNCTIONAL DATA

CONCEPT OF DENSITY FOR FUNCTIONAL DATA CONCEPT OF DENSITY FOR FUNCTIONAL DATA AURORE DELAIGLE U MELBOURNE & U BRISTOL PETER HALL U MELBOURNE & UC DAVIS 1 CONCEPT OF DENSITY IN FUNCTIONAL DATA ANALYSIS The notion of probability density for a

More information

Some Remarks on the Discrete Uncertainty Principle

Some Remarks on the Discrete Uncertainty Principle Highly Composite: Papers in Number Theory, RMS-Lecture Notes Series No. 23, 2016, pp. 77 85. Some Remarks on the Discrete Uncertainty Principle M. Ram Murty Department of Mathematics, Queen s University,

More information

A matrix over a field F is a rectangular array of elements from F. The symbol

A matrix over a field F is a rectangular array of elements from F. The symbol Chapter MATRICES Matrix arithmetic A matrix over a field F is a rectangular array of elements from F The symbol M m n (F ) denotes the collection of all m n matrices over F Matrices will usually be denoted

More information

Math 321 Final Examination April 1995 Notation used in this exam: N. (1) S N (f,x) = f(t)e int dt e inx.

Math 321 Final Examination April 1995 Notation used in this exam: N. (1) S N (f,x) = f(t)e int dt e inx. Math 321 Final Examination April 1995 Notation used in this exam: N 1 π (1) S N (f,x) = f(t)e int dt e inx. 2π n= N π (2) C(X, R) is the space of bounded real-valued functions on the metric space X, equipped

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

1 Math 241A-B Homework Problem List for F2015 and W2016

1 Math 241A-B Homework Problem List for F2015 and W2016 1 Math 241A-B Homework Problem List for F2015 W2016 1.1 Homework 1. Due Wednesday, October 7, 2015 Notation 1.1 Let U be any set, g be a positive function on U, Y be a normed space. For any f : U Y let

More information

Lectures 9-10: Polynomial and piecewise polynomial interpolation

Lectures 9-10: Polynomial and piecewise polynomial interpolation Lectures 9-1: Polynomial and piecewise polynomial interpolation Let f be a function, which is only known at the nodes x 1, x,, x n, ie, all we know about the function f are its values y j = f(x j ), j

More information

Linear Algebra Massoud Malek

Linear Algebra Massoud Malek CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product

More information

THESIS. Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate School of The Ohio State University

THESIS. Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate School of The Ohio State University The Hasse-Minkowski Theorem in Two and Three Variables THESIS Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate School of The Ohio State University By

More information

1 Matrices and Systems of Linear Equations

1 Matrices and Systems of Linear Equations March 3, 203 6-6. Systems of Linear Equations Matrices and Systems of Linear Equations An m n matrix is an array A = a ij of the form a a n a 2 a 2n... a m a mn where each a ij is a real or complex number.

More information

Fundamentals of Linear Algebra. Marcel B. Finan Arkansas Tech University c All Rights Reserved

Fundamentals of Linear Algebra. Marcel B. Finan Arkansas Tech University c All Rights Reserved Fundamentals of Linear Algebra Marcel B. Finan Arkansas Tech University c All Rights Reserved 2 PREFACE Linear algebra has evolved as a branch of mathematics with wide range of applications to the natural

More information

We describe the generalization of Hazan s algorithm for symmetric programming

We describe the generalization of Hazan s algorithm for symmetric programming ON HAZAN S ALGORITHM FOR SYMMETRIC PROGRAMMING PROBLEMS L. FAYBUSOVICH Abstract. problems We describe the generalization of Hazan s algorithm for symmetric programming Key words. Symmetric programming,

More information