Nonparametric Density Estimation and Monotone Rearrangement

Size: px
Start display at page:

Download "Nonparametric Density Estimation and Monotone Rearrangement"

Transcription

1 Nonparametric Density Estimation and Monotone Rearrangement Robben Riksen July 16, 2014 Bachelor Thesis Supervisor: dr. A.J. van Es Korteweg-de Vries Institute for Mathematics Faculty of Natural Sciences, Mathematics and Informatics University of Amsterdam

2

3 Abstract In nonparametric density estimation, the method of kernel estimators is commonly used. However, if extra information about the monotonicity of the target density is available, kernel estimation does not take this into account and will in most cases not give a monotone estimate. A recently appeared method in this field of research is the method of monotone rearrangement. Using this extra information to improve the density estimate, it has some interesting properties. In this thesis, nonparametric density estimation by means of kernel estimators is discussed. The reflection method is introduced to improve kernel estimation if the target density equals zero on the negative real line and has a discontinuity in zero. In the case where information about monotonicity of the target density is available, monotone rearrangement is used to monotonize the original estimate. The theoretical and asymptotic properties of the reflection method and monotone rearrangement will be discussed briefly, after which simulations in Matlab are run to measure the performance, in terms of MISE, of both methods separately and applied together. It appears to be the case that both the reflection method and monotone rearrangement greatly reduce the MISE of the kernel estimator, but when applied together, the MISE remains of the level of that of the reflection method. Cover image: a plot of an exponential-1 density (dashed-dotted line), a kernel estimate based on a sample of size n = 100 (dashed line) and its rearranged estimate (solid line). Title: Nonparametric Density Estimation and Monotone Rearrangement Author: Robben Riksen, robben.riksen@student.uva.nl, Supervisor: dr. A.J. van Es Second grader: dr. I.B. van Vulpen Date: July 16, 2014 Korteweg-de Vries Institute for Mathematics University of Amsterdam Science Park 904, 1098 XH Amsterdam 3

4

5 Contents Introduction 7 1. Theory Kernel Estimation of Smooth Densities The Histogram Method Kernel Estimators Defining Errors Bandwidth Selection Decreasing Densities Reflection Method Monotone Rearrangement Introducing Monotone Rearrangement Properties of Monotone Rearrangement Results Comparing Methods Simulations Conclusion Discussion and Review of the Progress Populaire Samenvatting 34 References 37 A. Theorems and Proofs 39 B. MATLAB code 44 B.1. Epanechnikov Kernel Estimator B.2. Gaussian Kernel Estimator B.3. Montone Rearrangement B.4. Simulations C. Samples 52 C.1. Sample C.2. Sample

6

7 Introduction When I returned from my semester abroad at the University of Edinburgh, I had made up my mind about the master s degree I wanted to pursue after graduating for my bachelor s degree: it had to be the master Stochastics and Financial Mathematics at the University of Amsterdam. For a longer time, my interest had gone out to statistics, so writing my bachelor thesis is this field of research was a logical step in the preparation for this master s programme. When approached, Dr. Van Es suggested a thesis about Monotone Rearrangement, a method of improving density estimates of monotone probability densities. Being such a recently developed method in this application, it sparked my interest and we decided it would be a suitable subject for a bachelor thesis. A common challenge in statistics is the estimation of an unknown density function out of a given data set. For example, this can be valuable to gain information about the occurrence of certain events in a population. Estimating the probability density from which such a data set is sampled, can be done in parametric and a nonparametric way. If we assume or know that a sample is drawn from, say, a N (µ, σ 2 ) distribution. We will apply parametric density estimation to estimate µ and/or σ. There are many ways of doing this, but the produced estimate will be normal distributed and its accuracy can only be improved by varying the parameters. If the assumption was right, this parametric way of density estimation is satisfying enough. But if we do not know if the density we want to estimate belongs to a known family of parameterized probability densities, the parametric approach obviously is not satisfying. In nonparametric density estimation, these kind of assumptions are not made. We only assume the target density exists, and make no assumptions about a parametric family to which the target density might, or might not, belong. Again, there are many ways to obtain a density function from observed data in a nonparametric way, but the most common way to do this is with kernel estimators, the method we will discuss in this thesis. The method of kernel estimation is designed to estimate smooth densities, but can also be applied to densities with discontinuities. We will take a closer look at densities that are equal to zero on the negative real line, and therefore have a discontinuity in zero; the boundary. To improve the kernel estimator in this case, the commonly used reflection method is applied. This method, introduced by Schuster (1985) [1], reflects the data set around zero, that is, adds the negative of the data points to the data set and applies the kernel estimation method to this new, bigger set of data points. This greatly improves the kernel estimation method. Suppose now there is some extra information about the target density: we know the density we want to estimate is monotonically decreasing, or increasing. The availability 7

8 of such information occurs often. For example, in biometrics, age-height charts, charts in which the expected height for children is plotted against their age, should be monotonic in age. In econometrics, demand functions are monotonic in price (both examples from [2]). And there are a lot of other examples in which a monotone density is expected. In these cases, kernel estimation still works fine. But the method of kernel estimators does not take this extra information into account, and will in most cases not produce a monotone estimate. Wanting to produce the best possible estimate from our data set, we would like to use all information available to minimize our estimation error. To monotonize the density found by kernel estimation, the method of monotone rearrangement is introduced. This relatively simple method, essentially rearranging function values in decreasing or increasing order, appeared first in the context of variational analysis in 1952 in the book Inequalities by Hardy, Littlewood and Pólya [3], but was only very recently introduced in the context of density estimation. Monotone rearrangement, however, proved to have many interesting properties in this context. This was the motivation to look into this subject further in this thesis The goal of this thesis is to familiarize with the method of kernel estimation and monotone rearrangement, and investigate whether a combination of methods, designed to improve kernel estimation, improves the estimate of a target density even more. Therefore, we will apply the reflection method and monotone rearrangement together to estimate a monotone probability density out of a generated sample. This leads to the following research question: Does the combination of the reflection method and monotone rearrangement significantly improve the estimation of a monotonic density from a given data sample? In an attempt to answer this question, these methods were implemented in Matlab and simulations were run with different sample sizes. This thesis will start with giving an overview of the method of kernel estimators, by introducing them in an intuitive way. Then, measures of how well an estimator estimates the target density will be introduced and discussed, after which several problems with the original kernel estimators are pointed out in the case where they are used to estimate target densities with a discontinuity in zero. A solution is proposed in the form of the reflection method. After that, the method of monotone rearrangement is introduced and some properties of this method are discussed. Then simulations are run to answer the main question of this thesis: Does the combination of the reflection method and monotone rearrangement significantly improve the estimation of a monotonic density from a given data sample? The results of the simulations are discussed and possible improvements to the simulation and estimation methods are proposed. 8

9 1. Theory This section will start with introducing the concept of kernel estimation in an intuitive way. The reflection method [1] will be introduced as a method to improve kernel estimation for densities that have a bounded support. An example of such a density is the exponential density, which is zero for all points on the negative real axis and has a discontinuity in 0. Then monotone rearrangement will be introduced and applied, to take into account extra information about monotonicity of the target density. Throughout this chapter, unless stated otherwise, the same data set with 100 points drawn from a N (µ, σ 2 ) distribution, with µ = 2 and σ 2 = 1, will be used in the examples to illustrate 2 the theory. The value of these data points can be found in Appix C.1. Furthermore, in general descriptions and theory it will be assumed that a sample of n indepently, identical distributed observations X 1, X 2,..., X n is given for which the density, f, will be estimated Kernel Estimation of Smooth Densities In this section we will introduce the method of kernel estimation, by starting with the histogram method as a density estimator. We then expand this idea to introduce kernel estimators and find a smooth estimate of the target density The Histogram Method Probably the most intuitive way of representing the probability density from which a data set is sampled, is by use of a histogram. To do this, the bin width h and starting point x 0 are chosen. The bins are defined as the half open intervals [x 0 + (m 1)h, x 0 + mh), for m Z. Then the histogram density approximation is defined as ˆf(x) = 1 nh {no. of X i in same bin as x}. For the data set from Appix C.1 this definition, with starting point x 0 = 0 and bin width h = , results in the density ˆf as depicted in figure 1.1. Clearly, the histogram method for approximating the density is not at all satisfactory. The approximation is not a continuous function, but the major drawback is that the choice of the starting point x 0 has a big influence on the approximated density, as is shown figure 1.2. The starting point is shifted to the left by 0.10 to x 0 = 0.1 while the bin width is the same. This results in the peak exting less far to the right and the structure 9

10 Figure 1.1.: Histogram for the data set from Appix C.1, with starting point x 0 = 0 and bin width h = Figure 1.2.: Histogram for the data set from Appix C.1, with starting point x 0 = 0.1 and bin width h =

11 separated from the rest of the bins visible in figure 1.1 has almost disappeared in figure 1.2. If we now, instead of placing the data points in bins, place bins around the data points, we get rid of the problem of the starting point. The only variable then will be the shape of the bin and the bin width h. This idea leads to the definition of kernel estimators, which we will introduce below Kernel Estimators A far more satisfactory method to estimate the probability density, is a family of methods based on kernel estimators. First a kernel function K, a function that places weight on (and around) the observations, has to be introduced. This kernel function, or simply kernel, K has to integrate to 1 over the real line. Now the kernel estimator can be defined as follows. Definition 1.1. Let X 1, X 2,..., X n be indepently, identical distributed observations with density f. A kernel estimator of the target function f, ˆf, with kernel K, is defined by ˆf(x) = 1 nh n ( ) x Xi K, (1.1) h i=1 where n is the number of data points and, from now on, h is called the bandwidth. A kernel K is said to be of order m if K n (x)dx = 0, for 0 < n < m, and K m (x)dx 0. (1.2) The mostly used symmetrical kernel are of order 2, because otherwise it can appear that the kernel estimate takes negative values in some areas, which is in most cases not desirable when estimating a probability density. An example of a commonly used kernel function is the standard normal density function. To visualize the working of a kernel estimator using the normal distribution function as a kernel, imagine small normal densities placed over the data points X i that add up to the estimated probability density. This is illustrated in figure 1.3, where we used the first 4 data points from sample 1 in Appix C.1 and bandwidth h = 0.5 to construct the density estimation ˆf. Varying the bandwidth h has big effects on the density estimate ˆf. Choosing h too small, will result in too much weight being given to single data point in the tail of the distribution, while choosing h too big will result in over smoothing and might conceal structures in the target density. Therefore, the selected bandwidth will largely determine the behaviour of the kernel estimator. This is illustrated by figure 1.4, where the bandwidth h = 0.3 is chosen. A (non-existent) and before invisible structure around x = 3 has appeared by choosing the bandwidth too small. 11

12 Figure 1.3.: Kernel estimator for the N (2, 0.5)-distribution from 4 data points with bandwidth h = 0.5, showing individual kernels. Figure 1.4.: Kernel estimator for the N (2, 0.5)-distribution from 4 data points with bandwidth h = 0.3, showing individual kernels. 12

13 Another property of a kernel estimator that determines its accuracy is the choice of the kernel. A huge variation in the choice of the kernel used in kernel estimation is possible. A commonly used symmetric kernel is the gaussian kernel we used in figure 1.3: K(t) = 1 2π e 1 2 t2. Another commonly used kernel is the Epanechnikov kernel K e : K e (t) = { 3 ( t2) if 5 t otherwise (1.3) This Kernel, sometimes called the optimal kernel, was introduced by Epanechnikov (1996) [4] and is the most efficient kernel function in terms of minimizing the MISE, although the difference compared to other kernels relatively small. Its drawback, on the other hand, is that is not continuously differentiable. The Kernels we just discussed will be the same for all data points. Variable kernels can be introduced, in which the shape of the kernel can dep on the position of the data point around which it is placed, or its distance to other data points. Deping on the information known about the data set, this can greatly increase the accuracy of the estimate Defining Errors To compare different estimation methods and to draw conclusions about which method improves the estimation, it is necessary to define what measures of error will be used to compare different techniques. In the derivations in this section, the line of Density Estimation for Statistics and Data Analysis, by Silverman (1986) [5], will be followed. For the error in the estimation in a single point, the mean square error is defined in the usual way. Definition 1.2. The mean square error is given by where ˆf(x) is the estimatior of the density f. MSE x ( ˆf) := E[ ˆf(x) f(x)] 2, (1.4) It is easy to see, using the expression for the variance of ˆf, Var ˆf(x) = E ˆf 2 (x) (E ˆf(x)) 2, that the MSE can be written in the following way MSE x ( ˆf) = We will now define the bias of ˆf. ( E ˆf(x) f(x)) 2 + Var ˆf. (1.5) Definition 1.3. Let ˆf(x) be the estimation of the density f. Then is called the bias of ˆf. b ˆf(x) := E ˆf(x) f(x), (1.6) 13

14 So in a single point, the mean square error is given by a bias term, representing the systematic error of ˆf, and a variance term. To quantify the error on the whole domain of f, the mean integrated square error is introduced. Definition 1.4. The mean integrated square error is given by MISE( ˆf) := E [ ˆf(x) f(x) ] 2 dx. (1.7) The MISE is a commonly used method of quantifying the error in an estimate. Now, because the integrant is non-negative, the expectation and the integral can be interchanged, and the MISE takes the following form [6]. [ MISE( ˆf) ] 2 : = E ˆf(x) f(x) dx = MSE x ( ˆf)dx [ = E ˆf(x) ] 2 f(x) dx + Var ˆf(x)dx = b 2ˆf(x)dx + Var ˆf(x)dx. (1.8) Thus, the MISE is the sum of the integrated square bias and the integrated variance. If in this definition ˆf is taken as in definition 1.1, its expectation value is [7] ( ) E ˆf(x) 1 x y = h K f(y)dy, (1.9) h and its variance can be found as [ Var ˆf(x) 1 n ( ) ] 1 x = Var n h K Xi h i=1 = 1 n [ ( )] 1 x Var n 2 h K Xi h i=1 ( = 1 [ ( )] ( [ ( )]) ) 2 1 x Xi 1 x E n h 2 K2 E h h K Xi h = 1 ( ) 1 x y n h 2 K2 f(y)dy 1 [ ( ) 2 1 x y K f(y)dy]. (1.10) h n h h Written in this form, it becomes clear that the bias term in the MISE does not directly dep on the sample size, thus this term will not be reduced just by taking larger samples. 14

15 From now on, it will be assumed that the unknown density has continuous derivatives of all required orders and the kernel function K used to define ˆf is a symmetric function with the following properties: K(x)dx = 1 (1.11) xk(x)dx = 0 (1.12) x 2 K(x)dx = k 2 0. (1.13) In other words, we are dealing with a symmetric kernel of order 2. These properties are met by most commonly used kernel functions, for in many cases the kernel K is a symmetric probability density with variance k 2. The kernel functions used in the simulations later on in this thesis will also satisfy these properties. To calculate the MISE of an estimation, it is useful to derive a more applicable (approximate) form of the bias and variance. So for the rest of this section we will take on a heuristic approach. We can rewrite the bias in the following way b ˆf(x) = E ˆf(x) f(x) ( ) 1 x y = h K f(y)dy f(x) h = K(t)f(x ht)dt f(x) (1.14) = K(t) [f(x ht) f(x)] dt, (1.15) where at a change of variable y = x ht is applied and at it is used that K integrates to unity. If the Taylor expansion f(x ht) = f(x) htf (x) + h 2 t 2 f (x) (1.16) is substituted in expression 1.15 the following expression for the bias is found: b ˆf(x) = hf (x) = = tk(t)dt h2 f (x) t 2 K(t)dt +... ( 1) p h p f (p) t p K(t)dt p! ( 1) p h p b (p) (x), (1.17) p=1 p=1 where b (p) (x) = f (p) p! t p K(t)dt. Using properties (1.12) and (1.13) of K, the bias can be approximated by 15

16 b ˆf(x) = h 2 b (2) (x) + o(h 4 ) h 2 b (2) (x). (1.18) So, the integrated square bias can be approximated by b 2ˆf(x)dx 1 4 h4 k 2 2 (f (x)) 2 dx. (1.19) In a similar manner, an approximation of the variance of ˆf can be found. Substitution of expression (1.9) into (1.10), gives Var ˆf(x) = 1 n 1 h 2 K2 ( x y h ) f(y)dy 1 [ 2 f(x) + b n ˆf(x)]. The substitution y = x ht in the integral and approximation (1.18) for the bias, leads to the following approximation for the variance: Var ˆf(x) 1 nh K 2 (t)f(x ht)dt 1 n [ f(x) + O(h 2 ) ] 2. With the Taylor expansion (1.16) and the assumption that n is large and h is small, this can be rewritten as Var ˆf(x) 1 K 2 (t) {f(x) htf (x) +...} dt + O(n 1 ) nh = 1 nh f(x) K 2 (t)dt + O(n 1 ) 1 nh f(x) K 2 (t)dt. (1.20) Because f is a probability density, f(x)dx = 1, so the integral over the variance of ˆf can be approximated by Var ˆf(x)dx 1 nh This means we can approximate the MISE by [6] K 2 (t)dt. (1.21) MISE = b 2ˆf(x)dx + Var ˆf(x)dx 1 4 h4 k2 2 (f (x)) 2 dx + 1 nh K 2 (t)dt. (1.22) 16

17 The last expression (1.22) is known as the Asymptotic Mean Integrated Square Error, AMISE, and is a useful approximation to the MISE when dealing with large samples and much easier to calculate than the MISE in equation (1.8) [8]. It now becomes clear that decreasing the bandwidth h, whilst reducing the integrated square bias term in (1.8), increases the integrated variance term. This means that finding the optimal bandwidth will always be a trade-off between random and systematic error. This is known as the variance-bias trade-off. More about bandwidth selection will be said in section 1.3. Besides the MISE as a measure of how well a density estimator approaches the target density, the asymptotic properties of the estimator are also important. Prakasa Rao (1983) [6] proved that there is no reasonable estimator ˆf n (x) such that [ ] E ˆfn (x) = f(x), (1.23) so we are forced to look for asymptotically unbiased estimators. That is, a sequence of density estimators ˆf n is asymptotically unbiased if, for every density f and x, [ lim E f ˆfn (x)] = f(x). (1.24) n We call a sequence of density estimators ˆf n weakly consistent if ˆf n (x) p f(x) as n. (1.25) Both definitions are valuable when looking for an appropriate density estimator Bandwidth Selection As stated before, a good choice of the bandwidth h in a kernel estimator is very important, for the bandwidth will largely determine the behaviour of the estimator. We will define the optimal bandwidth as that bandwidth which minimizes the AMISE (1.22). In this thesis we will not go into detail about different methods to determine the optimal bandwidth, but we will just state the theoretical asymptotically optimal bandwidth. Parzen (1962) [9] showed that the asymptotically optimal bandwidth for a two times continuously differentiable density f is equal to ( h opt n = k ) 1 ( ) K(t) dt f (x) 2 5 dx n 1 5, (1.26) where k 2 is the same as in equation (1.13). In this thesis however, we will be dealing with densities with discontinuities. In Van Eeden 1985 [10] the asymptotically optimal 17

18 bandwidth is derived for non-smooth densities. Let D be the set of discontinuity points of f, then the asymptotically optimal bandwidth is given by h opt n = ( K(t) 2 dt 0 ( K(u)du ) 2 t dt ) 1 ( ) 2 (f(d+) f(d )) 2 d D 1 2 n 1 2, (1.27) where f(d+) = lim h 0 f(d + h) and f(d ) = lim h 0 f(d h). So we conclude that for smooth densities, the optimal bandwidth is of order n 1 5, and for discontinuous densities the optimal bandwidth is of order n 1 2. In practice, for finite samples, the optimal bandwidth has to be approximated. There are many methods of finding a suitable bandwidth, but in the simulations later on in this thesis we will use the Sheather and Jones bandwidth selection procedure [11]. Now we have defined the measures by which we can measure the quality of a density estimator and know what the theoretical optimal bandwidths are, we can move on to the different methods of improving the estimator found by kernel estimation Decreasing Densities When a probability density function is monotonically decreasing, the lower bound of the interval on which this density is defined has to exist. Otherwise this function can never be a probability density, for it will not integrate to unity. So the density function will take on positive (non-zero) values on an interval [a l, ) and will be equal to zero on the interval (, a l ). This means, for a monotonely decreasing density, there has to be a discontinuitiy in x = 0. However, most kernel estimators will not correct for this discontinuity and will have postive values on the interval where the target density equals zero. Especially at this boundary, the density estimate will be inaccurate and even not consistent with the target density. The problems induced by such a boundary are called boundary effects (or boundary constraints). Kernel estimators not taking into account these boundary constraints we will call unconstrained estimators. In this section we will discuss the reflection method. A method that corrects the kernel density estimate to these boundary effects Reflection Method In the previous sections, the kernel density method is introduced as a method to estimate a target density that satisfies certain smoothness criteria. For example, in the derivation of the AMISE (1.22) we assumed the target function f to have a continuous second derivative over the whole real line. In many cases densities are used that do not satisfy these conditions. A common example of such a density is the exponential-λ density, f(x) = λe λx 1 {x 0}, which has an obvious discontinuity in x = 0 and only positive values on the positive real line. From figure 1.5, in which an exponential-1 density is 18

19 estimated by an kernel estimator with an Epanechnikov kernel from a sample of n = 100 observations (sample 2 in Appix C.2), it becomes clear that the estimator gives too little weight to the points close to the boundary on the positive x-axis and too much (namely any) to points on the negative axis. Figure 1.5.: Kernel density estimation with Epanechnikov kernel with bandwidth h = 0.7 (dashed line) of an exponential-1 density (solid line). So such boundary effects influence the performance of the estimator near the boundary [8],[12]. To quantify this, suppose f is a density with f(x) = 0, for x < 0, and f(x) > 0, for x 0, and has a continuous second derivative away from x = 0. Let ˆf be a kernel estimator of f based on a kernel K with support [ 1, 1] and bandwidth h. We express a point x as x = ch, h 0. With the same change of variable as in equation (1.14) we find for the expectation of ˆf E ˆf(x) = c Now if c 1, that is x h, we saw in equation (1.18) that 1 K(t)f(x ht)dt. (1.28) E ˆf(x) = f(x) + h 2 b (2) (x) + o(h 4 ), (1.29) for a kernel of order 2. If on the other hand 0 c < 1, we find after Taylor expansion of f(x ht) around x 19

20 c E ˆf(x) = f(x) K(t)dt + o(h). (1.30) 1 Because in general c K(t)dt 1, ˆf 1 is not consistent in points close to x = 0. By the assumption that the kernel K is symmetric, we find E ˆf(0) = 1 f(0) + o(h) at the 2 boundary x = 0. This can be explained intuitively by the kernel estimator that has to be a smooth function on the whole line, but is an estimation of the target function f that is not continuous in x = 0. To create a consistent kernel estimator near the boundary there exist multiple methods. In this section we will take a closer look at the reflection method, as introduced by Schuster (1985) [1]. All observations are mirrored in x = 0, so to the set of observations X 1... X n, the observations X 1... X n are added. Then on this new set of observations the kernel estimator is applied for x [0, ). This method can also be described in the form of a kernel estimator on X 1... X n for x [0, ) as ˆf R (x) = 1 nh n i=1 { K ( ) ( x Xi x + Xi + K h h )}. (1.31) Note that ˆf R (x) integrates to 1 on the positive real axis and is a density. If we apply the reflection method to the same set of observations that was used in figure 1.5, this results in figure 1.6. At first sight ˆf R (x) already is a better estimator of the target density f near the boundary, and it no longer gives any weight to negative values of x. This results in an asymptotically unbiased estimate of f. Following the same steps as before E ˆf R (x) = E 1 n { ( ) ( )} x Xi x + Xi K + K nh h h i=1 = E 1 ( ) x X h K + E 1 ( ) x + X h h K h ( ) ( ) 1 x y 1 x + y = h K f(y)dy + h h K f(y)dy h = K(t)f(x ht)dt + K(t)f( x + ht)dt, (1.32) where in the first integral the change of variable y = x ht is applied, and in the second integral y = x + ht. If we again express a point x close to the boundary as x = ch, h 0, c [0, 1), and consider that f(x) = 0 for x < 0 and K(t) = 0 for t < 1 and 1 < t, we find 20

21 Figure 1.6.: Reflection method with Epanechnikov kernel with bandwidth h = 0.7 (dashed line) of an exponential-1 density (solid line). E ˆf R (x) = c 1 = f(x) K(t)f(x ht)dt + c 1 1 K(t)dt + f( x) c 1 K(t)f( x + ht)dt c K(t)dt + o(h), (1.33) using Taylor expansion of f(x ht) around x and f( x + ht) around x is used. So for x = 0 the expectation becomes f(0) + o(h) and we see that the estimator ˆf R (x) is asymptotically unbiased at the boundary. To achieve a smaller bias of order O(h 2 ) near the boundary, the generalized reflection method was introduced by Karunamuni and Alberts (2006) [13], but this is beyond the scope of this thesis Monotone Rearrangement In some cases there is additional information about the target density. Suppose it is known that the probability density is a monotone function, it then is desirable to use this extra information to find a density estimate that matches the original density better. Clearly, the kernel estimation method discussed in the previous section does not take this information into account. In this section monotone rearrangement will be introduced, a method that adapts the approximated density gained by the use of kernel estimation to respect the given monotonicity of the target probability function. 21

22 In this chapter, unless stated otherwise, f will be the target density and ˆf the not necessarily monotone estimation of f, for example generated by a kernel method as discussed in the previous chapter Introducing Monotone Rearrangement Let U be uniformly distributed on a compact interval A = [a l, a u ] in R, and g : A R a strictly increasing and differentiable function. Now the distribution function F g (y) of g(u) is proportional to [14] F g (y) = au a l 1 {g(u) y} du + a l = g 1 (y), t g([a l, a u ]). (1.34) In the case where g is not a strictly increasing function, the distribution function is still given by the integral above, and we will define the isotone increasing rearrangement as the generalized inverse of the distribution function: g (x) = inf{y R F g (y) x}. (1.35) In other words, g is the quantile function of F g and F g (y) is the measure of the set in which g(u) y [3]. In a similar way the isotone decreasing rearrangement can be defined with distribution function F g (y) = au and decreasing rearrangement: a l 1 {g(u) y} du + a l, t g([a l, a u ]) (1.36) g (x) = inf{y R F g (y) x}. (1.37) This reasoning is valid for every compact interval A in R, so for reasons of convenience A = [0, 1] is used from now on. In section it is shown that the rearrangement of ˆf, as defined in (1.35) and (1.37), will always give a better approximation of the monotone target function, when applied to a non-monotonic estimation. To illustrate the principle of monotone rearrangement, a simple example is studied. The method of monotone rearrangement as described above is applied to the function g(x) = (2x 1), x [0, 1]. (1.38) Clearly, this function is not monotone. Using definition (1.34) and (1.35), the increasing monotone rearrangement of the function g is given by 22

23 { g (x) := inf y R { (2u 1) 2 y } } du x. (1.39) So the rearrangement sorts the values of g on [0, 1] in ascing order. The result is plotted in figure 1.7. Figure 1.7.: Increasing monotone rearrangement (dashed line) applied to the function g(x) = (2x 1) 2 (solid line). In the same way, using definition (1.37), the decreasing rearrangement is given by g (x) = inf { y R { (2u 1) 2 y } } du x. (1.40) Resulting in figure 1.8. In section we used the reflection method to improve the kernel estimator. From figure 1.6 it is clear that the resulting estimate is not a monotone decreasing function, while the target density is. Applying the isotone decreasing rearrangement as defined in (1.37) to the reflected density, results in figure 1.9. If we plot the reflection estimate and its monotone rearrangement together and zoom in at the part around x = 2 where the reflection method estimate is not monotone, we see how the monotone rearrangement method rearranges the function values to find a monotone estimate, as is shown in figure As we just saw, the monotone rearrangement method monotonizes the kernel density estimate, to achieve a better estimate of the monotone target density. However, the rearrangements achieved by equations (1.35) and (1.37) will not always be continuously differentiable functions. For example, if we take a look at the function f(x) = x+ 1 4 sin(4πx) 23

24 Figure 1.8.: Decreasing monotone rearrangement (dashed line) applied to the function g(x) = (2x 1) 2 (solid line). Figure 1.9.: Decreasing monotone rearrangement applied to reflection method with Epanechnikov kernel with bandwidth h = 0.7 (dashed line) of an exponential-1 density (solid line) and its isotone rearrangement, as depicted in figure 1.11 below, we clearly see that the rearrangement is not everywhere continuously differentiable. In some cases however, it 24

25 Figure 1.10.: Decreasing monotone rearrangement (dotted line) applied to reflection method with Epanechnikov kernel (dashed line) of an exponential-1 density (solid line) might be necessary to find a differentiable estimator of f. To find an everywhere differentiable increasing rearrangement, the indicator function in the distribution function (1.34) can be approximated by a kernel in the following way [14]. Let K d be a positive kernel of order 2 with compact support [ 1, 1] and h d the corresponding bandwidth. Then F g,hd (y) = 1 h d 1 0 y is a smoothed version of the distribution function and ( ) g(u) v K d dvdu (1.41) h d g h d (x) = inf{y R F g,hd (y) x} (1.42) is called the smoothed increasing rearrangement of g. In the same way as before, the decreasing rearrangement can be defined via the smoothed distribution function and becomes F g,hd (y) = 1 h d 1 0 y ( ) g(u) v K d dvdu, (1.43) h d 25

26 Figure 1.11.: Increasing isotone rearrangement (dashed line) applied to the function f(x) = x + 1 sin(4πx) (solid line). 4 g h d (x) = inf{y R F g,hd (y) x}. (1.44) For sufficiently small h d, the smoothed rearrangements g h d and g h d of an unconstrained probability density are still probability densities, and converge pointwise to the isotone rearrangements g and g respectively, as proved in Birke (2009) [14] (see Appix A, theorem A.1 for the proof). As pointed out in Neumeyer (2007) [15], both the smoothed and the isotone rearrangement share the same rate of convergence. Deping on the properties required for the estimator, one of these method can be chosen. The smoothed rearrangement will be preferred if one requires a smooth estimator, however for the isotone rearrangement there is no need for the choice of bandwidth h d and flat parts of g will be better reflected by g. In the rest of this thesis and in the simulations, we will choose to use the isotone rearrangement rather than the smoothed rearrangement Properties of Monotone Rearrangement The method of monotone rearrangement has promising properties. Bennett and Sharpley (1988) [16] showed that the monotone rearrangement f of a function f has the same 26

27 L p -norm, p [1, ), as the function f: f p = f p. Chernozhukov et al. (2009) [2] proved that monotone rearrangement of an unconstrained estimate of the target function, decreases the estimation error in the L p -norm, that is ˆf f p ˆf f p (see Appix A, theorem A.3 for a proof). Taking p = 2 this implies that the MISE of the rearranged estimate will be smaller than that of the original unconstrained estimate. Neumeyer (2007) [15] stated that the smoothed rearrangement of an estimator ˆf is pointwise consistent if ˆf is pointwise consistent. Furthermore, she showed that the asymptotic behavior of the rearrangement is the same as that of the original unconstrained estimator. The above definitions of monotone rearrangement were only introduced for densities with a bounded support. In many purposes, as is the case with our example of the exponential-1 density, the support of the target density is unbounded. For decreasing densities it will often be of the form A = [a l, ) and for increasing densities of the form A = (, a u ]. Dette and Volgushev (2008) [17] showed that in these cases the rearrangements can be defined on an unbounded support and that the asymptotic behavior of the rearrangement is the same as described above. In this order, we can safely use monotone rearrangement in the estimation of the exponential-1 density function. 27

28 2. Results 2.1. Comparing Methods The main point of this thesis is to investigate whether improving the kernel estimator by the reflection method, as introduced in section 1.4.1, before applying the method of monotone rearrangement, will result in a better estimate of a monotone target density. In the last section we saw the asymptotic properties of the monotone rearrangement of an estimate are the same as those of the original estimate. So since the reflection method produces an consistent estimate of a density function with a discontinuity in x = 0, applying monotone rearrangement to this estimate will also produce a consistent estimate, and it will share the same asymptotic properties. In practice however, one does not encounter samples of infinite size. Therefore it is valuable to look at the behaviour of MISE of the methods for samples of smaller sizes. To measure the performance of the methods, simulations have been run in Matlab with different sample sizes. The density we will look at, is the exponential-1 density we also used in section 1.4.1: f(x) = e x 1 [0, ) (x) Simulations In the simulations ran in this thesis, we used a Sheather and Jones bandwidth selector. This selector was originally designed by Sheather and Jones (1991) [11] for smooth densities. In our case, we applied it to the exponential-1 density, which has a discontinuity in x = 0. Van Es and Hoogstrate (1997) [18] showed that the Sheather and Jones bandwidth selection method adapts slightly to the non-smoothness of the target density. The bandwidths produced by this method are of smaller order than n 1 5 (1.26), the optimal rate for smooth densities, but the optimal rates of n 1 2 (1.27) for densities with discontinuities are not reached. However, they state that the Sheather and Jones bandwidth still performs well in smaller sample size applications, which is why we chose to use this bandwidth selection method. Therefore we used a Matlab script published by Dynare [19], mh optimal bandwidth, which we adapted slightly to fit our needs (e.g. removed options that were unnecessary in our application to speed up the selection process and reduce the amount of input variables). The kernel we used in the simulations is a gaussian kernel, for the Sheather and Jones bandwidth selection method requires an at least six times differentiable kernel and the Epanechnikov kernel is not continuously differentiable. Simulations where run on samples of different sizes, n = 50, 100, 500, For each sample size, 1000 samples were generated to which the kernel estimation methods have 28

29 been applied. The estimated densities were used to approximate the MSE and MISE of the method, where the average of the function values was used to approximate the expectation of the estimator and the sample variance to approximate the variance of the estimator. Furthermore, to calculate the MISE from the MSE, the Composite Simpson s Rule is used to approximate the integral from the numerical values. From each sample, the density was estimated by a pure kernel estimator as in definition 1.1 with a gaussian kernel (second column of table 2.1), the decreasing isotone monotone rearrangement of this kernel estimation (third column of table 2.1), the reflection method applied to the sample (fourth column of table 2.1) and the decreasing isotone monotone rearrangement applied to the reflection method (fifth column of table 2.1). After the calculation of the estimate, the MSE and the MISE of the estimators was calculated. Hereby we have to note, that for the smaller sample size n = 50, the values of the MISE can vary when the simulations are repeated. For the larger sample sizes the mean integrated square errors are stable in repetition of the simulation. We tried several ways to implement the monotone rearrangement algorithm. Implementation literally following the definition in equation (1.37) turned out to be inaccurate if the simulations had to be run within reasonable calculation time. Another method of computing the rearranged estimate was proposed in Chernozhukov et al. (2009) [2], using quantiles of the set of numerical function values as the rearrangement. This method proved to be far more accurate in terms of MISE, and required significantly less calculation time. The most accurate and efficient method however, also proposed in the same paper [2], and in Anevski and Fougères (2008) [20], was simulating the rearrangement as a sorting operator. If the function values of ˆf, the function we want to apply the rearrangement to, are calculated in a fine enough grid of equidistant points, the decreasing rearrangement can just be simulated by the sorting of these function values in decreasing order. Therefore, this is the method we used, to calculate the mean integrated square errors for the rearrangements. For the reproducibility of these results, one can find the full Matlab code used for the simulations in Appix B.4. n MISE MISE rearr MISE refl MISE refl+rearr Table 2.1.: The MISE for the different methods of improving the kernel estimation for samples of size n. The mean square errors of the simulations for sample size n = 1000 are represented in figure 2.1, where it becomes clear that al methods have a comparable performance away from the boundary x = 0. The difference in the performance of these methods lies, as we would expect, close to the boundary. A close-up of this area is shown in the second 29

30 graph of figure 2.1. Figure 2.1.: In the estimation of the exponential-1 density, the MSE of the pure kernel estimator (solid line), its monotone rearrangement (dashed line), the reflection method (dotted line) and its monotone rearrangement (dotted-dashed line). Note that the dotted line and the dotted-dashed line are indistinguishable in these plots. The graph below is a close-up of the area close to the boundary x = 0. 30

31 3. Conclusion As we saw from table 2.1 for each sample size n, applying decreasing rearrangement after applying the reflection method has no, or a very small effect. One could think the MISE is reduced in the simulation with sample size n = 50, but because the variation in the MISE for this sample size is relatively big, the small improvement is not significant. What does become clear from table 2.1 is that both the monotone rearrangement method and the reflection method significantly reduce the MISE. It also shows that in terms of the reduction of the MISE, the reflection method is preferred over the monotone rearrangement. However, in terms of computation time the monotone rearrangement algorithm performs better than the reflection method. The time necessary for rearranging a found estimate is negligible compared to the computation time necessary for kernel estimation. For example, for a sample of size n = 1000, unconstrained kernel estimation and monotone rearrangement takes roughly 0.5 seconds, while the reflection method takes somewhat more than 1 second to compute. Rearranging the reflected estimate takes negligible time, but on the other hand will not improve the estimate significantly. Therefore, for extremely large samples it might be more desirable to use the monotone rearrangement algorithm than to use the reflection method. The conclusion we can draw from the data in table 2.1 is that combining the method of monotone rearrangement and the reflection method will not significantly improve the estimate relatively to just applying the reflection method. 31

32 4. Discussion and Review of the Progress Several remarks can be made about the simulations ran in this thesis. First of all, it should be stressed that the sample size of n = 50 is too small to draw any conclusions, for the MISE varies a lot when the estimation procedure is repeated on a different set of samples. Secondly, more monotone probability densities should be investigated to confirm the answer to the research question more firmly. And thirdly, the Sheather and Jones bandwidth selector we used to estimate the target density in this thesis is not optimal for discontinuous densities, as was stated in section 1.3. Therefore other bandwidth selection methods should be investigated to improve the density estimates. For example, for larger sample sizes, one could use the least squares cross-validation bandwidths, which are discussed in Stone (1984) [21]. These bandwidths are asymptotically equivalent to the optimal bandwidths, even in the non-smooth cases [18]. This might be the reason why the accuracy of the unconstrained kernel estimator is less than that in [14], where a different bandwidth selector was used. Since the monotone rearranged estimator is based on this, less accurate, kernel estimator, the monotone rearrangement will also be less accurate, thus the mean integrated square errors will not be comparable to those from [14]. In this thesis, we started out with the introduction of kernel estimators and some basic properties of this method. Then we pointed out the boundary effects that occur when a target density with a discontinuity in x = 0 is estimated. As a solution to the loss of consistency, the reflection method was discussed. Monotone rearrangement, a method rearranging function values in ascing or decreasing order, is proposed in the case where a monotonic density is to be estimated. Simulations in Matlab were run, to compare the reflection method and the method of monotone rearrangement separately and applied together. Review of the Progress Taking on a project exting over such a long period of time was new for me, and I learned a lot in the progress. Looking back on the last few months, there are a lot of things I could have done better, but also a lot of things that worked out well. The most important lesson I learned in these months, is that I should not hesitate to approach my supervisor. While reading the articles I used for this thesis, there were moments where I was struggling a lot with minor details, which in some cases cost me days. However, a simple explanation by my supervisor, sometimes just a clarification of a definition, was enough to get me back on track. After the introduction to the subject of nonparametric density estimation and mono- 32

33 tone rearrangement by Dr. Van Es, I started familiarizing myself with the subject. There were a lot of books at hand with a proper introduction to kernel estimation to get acquainted with this principle. However, since the method of monotone rearrangement was only recently introduced in the subject of statistics, I had to get all the information on this subject out of articles that are not written for undergraduate students who are new to this field of research. This made it harder to fully understand the method and work with it. Therefore it took me more time than I had expected, especially because most of the time while I was working on this project, I still had to follow courses, so I had to divide my attention. Only when the courses ed, this project could finally get my full attention, and in those weeks the real progress was made. Furthermore, simulating in Matlab was relatively new to me, but fortunately enough it did not took me long to gain experience in the programming and I really enjoyed it. Personally, I found the conclusion drawn rather surprising, for I had expected that monotone rearrangement would perform better in combination with the reflection method. If I had had more time for this project, I would have liked to find out why the combined performance of these methods is relatively poor. Also, I would have liked to take a look at another method of nonparametric density estimation, the method of nonparametric maximum likelihood estimators (NPMLE), and measure its performance relatively to the kernel estimators. On top of that, I would have liked to learn more about asymptotic statistics to find out more about the asymptotic properties of different estimation methods. Luckily, I will get this chance next year during my masters. To conclude, I would like to thank Dr. Van Es for introducing me to this subject and his support during the process of writing this thesis. 33

34 5. Populaire Samenvatting Het onderwerp van deze bachelorscriptie is het schatten van kansdichtheden en verschille methoden om een dergelijke schatting te verbeteren. Om dat volledig te kunnen begrijpen, is het uiteraard een vereiste te weten wat een kansdichtheid is. Gemakkelijk gezegd, is een kansdichtheid een functie die aan elke gebeurtenis de kans van zijn voorkomen toekent. In het geval dat er eindig veel mogelijkheden zijn, is dit zelfs de definitie van een kansdichtheid. Om alles te verduidelijken zullen we de begrippen verder verkennen met behulp van een voorbeeld. Stel we zouden van een bepaalde geiser de lengte van een eruptie willen kunnen voorspellen. Om dat te kunnen doen, gaan we een maand lang met een stopwatch naast de geiser staan, en elke keer dat hij uitbarst meten we hoeveel minuten de uitbarsting duurt. Aan het eind van de maand hebben we, zeg 50 metingen gedaan (we zullen de eerste helft van de dataset uit Appix C.2 gebruiken om deze gegevens te simuleren). Een veel gebruikte manier om die data weer te geven is met behulp van een histogram. We delen de tijd op in balkjes en als de lengte van een eruptie binnen een balkje valt, maken we het balkje iets hoger. Zo krijgen we uiteindelijk een histogram waarin de hoogte van elk balkje het relatieve voorkomen van de eruptielengte is, zoals te zien is in het linker deel van figuur 5.1. Figure 5.1.: Links: histogram van de eruptielengtes (eerste helft data Appix C.2). Rechts: schatting van de kansdichtheid op basis van het histogram. Als we nu de balkjes wegdenken, alleen de bovenkant nemen en dit beschouwen als een curve, zoals gedaan is in de rechterkant van figuur 5.1, zouden we dit kunnen interpreteren als een schatting van de kansdichtheid. De oppervlakte onder een interval 34

Boundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta

Boundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta Boundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta November 29, 2007 Outline Overview of Kernel Density

More information

Nonparametric Methods

Nonparametric Methods Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis

More information

A NOTE ON TESTING THE SHOULDER CONDITION IN LINE TRANSECT SAMPLING

A NOTE ON TESTING THE SHOULDER CONDITION IN LINE TRANSECT SAMPLING A NOTE ON TESTING THE SHOULDER CONDITION IN LINE TRANSECT SAMPLING Shunpu Zhang Department of Mathematical Sciences University of Alaska, Fairbanks, Alaska, U.S.A. 99775 ABSTRACT We propose a new method

More information

Quantitative Economics for the Evaluation of the European Policy. Dipartimento di Economia e Management

Quantitative Economics for the Evaluation of the European Policy. Dipartimento di Economia e Management Quantitative Economics for the Evaluation of the European Policy Dipartimento di Economia e Management Irene Brunetti 1 Davide Fiaschi 2 Angela Parenti 3 9 ottobre 2015 1 ireneb@ec.unipi.it. 2 davide.fiaschi@unipi.it.

More information

Shape constrained kernel density estimation

Shape constrained kernel density estimation Shape constrained kernel density estimation Melanie Birke Ruhr-Universität Bochum Fakultät für Mathematik 4478 Bochum, Germany e-mail: melanie.birke@ruhr-uni-bochum.de March 27, 28 Abstract In this paper,

More information

Kernel density estimation

Kernel density estimation Kernel density estimation Patrick Breheny October 18 Patrick Breheny STA 621: Nonparametric Statistics 1/34 Introduction Kernel Density Estimation We ve looked at one method for estimating density: histograms

More information

A Novel Nonparametric Density Estimator

A Novel Nonparametric Density Estimator A Novel Nonparametric Density Estimator Z. I. Botev The University of Queensland Australia Abstract We present a novel nonparametric density estimator and a new data-driven bandwidth selection method with

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Nonparametric Density Estimation

Nonparametric Density Estimation Nonparametric Density Estimation Econ 690 Purdue University Justin L. Tobias (Purdue) Nonparametric Density Estimation 1 / 29 Density Estimation Suppose that you had some data, say on wages, and you wanted

More information

Introduction to Curve Estimation

Introduction to Curve Estimation Introduction to Curve Estimation Density 0.000 0.002 0.004 0.006 700 800 900 1000 1100 1200 1300 Wilcoxon score Michael E. Tarter & Micheal D. Lock Model-Free Curve Estimation Monographs on Statistics

More information

Time Series and Forecasting Lecture 4 NonLinear Time Series

Time Series and Forecasting Lecture 4 NonLinear Time Series Time Series and Forecasting Lecture 4 NonLinear Time Series Bruce E. Hansen Summer School in Economics and Econometrics University of Crete July 23-27, 2012 Bruce Hansen (University of Wisconsin) Foundations

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

More on Estimation. Maximum Likelihood Estimation.

More on Estimation. Maximum Likelihood Estimation. More on Estimation. In the previous chapter we looked at the properties of estimators and the criteria we could use to choose between types of estimators. Here we examine more closely some very popular

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Nonparametric Estimation of Luminosity Functions

Nonparametric Estimation of Luminosity Functions x x Nonparametric Estimation of Luminosity Functions Chad Schafer Department of Statistics, Carnegie Mellon University cschafer@stat.cmu.edu 1 Luminosity Functions The luminosity function gives the number

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.1 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Kernel Density Estimation

Kernel Density Estimation Kernel Density Estimation Univariate Density Estimation Suppose tat we ave a random sample of data X 1,..., X n from an unknown continuous distribution wit probability density function (pdf) f(x) and cumulative

More information

Histogram Härdle, Müller, Sperlich, Werwatz, 1995, Nonparametric and Semiparametric Models, An Introduction

Histogram Härdle, Müller, Sperlich, Werwatz, 1995, Nonparametric and Semiparametric Models, An Introduction Härdle, Müller, Sperlich, Werwatz, 1995, Nonparametric and Semiparametric Models, An Introduction Tine Buch-Kromann Construction X 1,..., X n iid r.v. with (unknown) density, f. Aim: Estimate the density

More information

Kernel Density Estimation

Kernel Density Estimation Kernel Density Estimation and Application in Discriminant Analysis Thomas Ledl Universität Wien Contents: Aspects of Application observations: 0 Which distribution? 0?? 0.0 0. 0. 0. 0.0 0. 0. 0 0 0.0

More information

A NOTE ON THE CHOICE OF THE SMOOTHING PARAMETER IN THE KERNEL DENSITY ESTIMATE

A NOTE ON THE CHOICE OF THE SMOOTHING PARAMETER IN THE KERNEL DENSITY ESTIMATE BRAC University Journal, vol. V1, no. 1, 2009, pp. 59-68 A NOTE ON THE CHOICE OF THE SMOOTHING PARAMETER IN THE KERNEL DENSITY ESTIMATE Daniel F. Froelich Minnesota State University, Mankato, USA and Mezbahur

More information

A General Overview of Parametric Estimation and Inference Techniques.

A General Overview of Parametric Estimation and Inference Techniques. A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying

More information

O Combining cross-validation and plug-in methods - for kernel density bandwidth selection O

O Combining cross-validation and plug-in methods - for kernel density bandwidth selection O O Combining cross-validation and plug-in methods - for kernel density selection O Carlos Tenreiro CMUC and DMUC, University of Coimbra PhD Program UC UP February 18, 2011 1 Overview The nonparametric problem

More information

A nonparametric method of multi-step ahead forecasting in diffusion processes

A nonparametric method of multi-step ahead forecasting in diffusion processes A nonparametric method of multi-step ahead forecasting in diffusion processes Mariko Yamamura a, Isao Shoji b a School of Pharmacy, Kitasato University, Minato-ku, Tokyo, 108-8641, Japan. b Graduate School

More information

12 - Nonparametric Density Estimation

12 - Nonparametric Density Estimation ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6

More information

41903: Introduction to Nonparametrics

41903: Introduction to Nonparametrics 41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific

More information

CS 147: Computer Systems Performance Analysis

CS 147: Computer Systems Performance Analysis CS 147: Computer Systems Performance Analysis Summarizing Variability and Determining Distributions CS 147: Computer Systems Performance Analysis Summarizing Variability and Determining Distributions 1

More information

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas 0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity

More information

An Introduction to Parameter Estimation

An Introduction to Parameter Estimation Introduction Introduction to Econometrics An Introduction to Parameter Estimation This document combines several important econometric foundations and corresponds to other documents such as the Introduction

More information

Figure Figure

Figure Figure Figure 4-12. Equal probability of selection with simple random sampling of equal-sized clusters at first stage and simple random sampling of equal number at second stage. The next sampling approach, shown

More information

EMPIRICAL EVALUATION OF DATA-BASED DENSITY ESTIMATION

EMPIRICAL EVALUATION OF DATA-BASED DENSITY ESTIMATION Proceedings of the 2006 Winter Simulation Conference L. F. Perrone, F. P. Wieland, J. Liu, B. G. Lawson, D. M. Nicol, and R. M. Fujimoto, eds. EMPIRICAL EVALUATION OF DATA-BASED DENSITY ESTIMATION E. Jack

More information

If we want to analyze experimental or simulated data we might encounter the following tasks:

If we want to analyze experimental or simulated data we might encounter the following tasks: Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction

More information

Adaptive Nonparametric Density Estimators

Adaptive Nonparametric Density Estimators Adaptive Nonparametric Density Estimators by Alan J. Izenman Introduction Theoretical results and practical application of histograms as density estimators usually assume a fixed-partition approach, where

More information

Chapter 1. Density Estimation

Chapter 1. Density Estimation Capter 1 Density Estimation Let X 1, X,..., X n be observations from a density f X x. Te aim is to use only tis data to obtain an estimate ˆf X x of f X x. Properties of f f X x x, Parametric metods f

More information

LECTURE NOTE #3 PROF. ALAN YUILLE

LECTURE NOTE #3 PROF. ALAN YUILLE LECTURE NOTE #3 PROF. ALAN YUILLE 1. Three Topics (1) Precision and Recall Curves. Receiver Operating Characteristic Curves (ROC). What to do if we do not fix the loss function? (2) The Curse of Dimensionality.

More information

The Growth of Functions. A Practical Introduction with as Little Theory as possible

The Growth of Functions. A Practical Introduction with as Little Theory as possible The Growth of Functions A Practical Introduction with as Little Theory as possible Complexity of Algorithms (1) Before we talk about the growth of functions and the concept of order, let s discuss why

More information

Nonparametric Density Estimation. October 1, 2018

Nonparametric Density Estimation. October 1, 2018 Nonparametric Density Estimation October 1, 2018 Introduction If we can t fit a distribution to our data, then we use nonparametric density estimation. Start with a histogram. But there are problems with

More information

Statistical inference on Lévy processes

Statistical inference on Lévy processes Alberto Coca Cabrero University of Cambridge - CCA Supervisors: Dr. Richard Nickl and Professor L.C.G.Rogers Funded by Fundación Mutua Madrileña and EPSRC MASDOC/CCA student workshop 2013 26th March Outline

More information

Slope Fields: Graphing Solutions Without the Solutions

Slope Fields: Graphing Solutions Without the Solutions 8 Slope Fields: Graphing Solutions Without the Solutions Up to now, our efforts have been directed mainly towards finding formulas or equations describing solutions to given differential equations. Then,

More information

L p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by

L p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by L p Functions Given a measure space (, µ) and a real number p [, ), recall that the L p -norm of a measurable function f : R is defined by f p = ( ) /p f p dµ Note that the L p -norm of a function f may

More information

Statistical inference

Statistical inference Statistical inference Contents 1. Main definitions 2. Estimation 3. Testing L. Trapani MSc Induction - Statistical inference 1 1 Introduction: definition and preliminary theory In this chapter, we shall

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Smooth simultaneous confidence bands for cumulative distribution functions

Smooth simultaneous confidence bands for cumulative distribution functions Journal of Nonparametric Statistics, 2013 Vol. 25, No. 2, 395 407, http://dx.doi.org/10.1080/10485252.2012.759219 Smooth simultaneous confidence bands for cumulative distribution functions Jiangyan Wang

More information

Supervised Learning: Non-parametric Estimation

Supervised Learning: Non-parametric Estimation Supervised Learning: Non-parametric Estimation Edmondo Trentin March 18, 2018 Non-parametric Estimates No assumptions are made on the form of the pdfs 1. There are 3 major instances of non-parametric estimates:

More information

On variable bandwidth kernel density estimation

On variable bandwidth kernel density estimation JSM 04 - Section on Nonparametric Statistics On variable bandwidth kernel density estimation Janet Nakarmi Hailin Sang Abstract In this paper we study the ideal variable bandwidth kernel estimator introduced

More information

Practice Problems Section Problems

Practice Problems Section Problems Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85,

More information

Nonparametric Econometrics

Nonparametric Econometrics Applied Microeconometrics with Stata Nonparametric Econometrics Spring Term 2011 1 / 37 Contents Introduction The histogram estimator The kernel density estimator Nonparametric regression estimators Semi-

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

BAYESIAN DECISION THEORY

BAYESIAN DECISION THEORY Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will

More information

Akaike Information Criterion to Select the Parametric Detection Function for Kernel Estimator Using Line Transect Data

Akaike Information Criterion to Select the Parametric Detection Function for Kernel Estimator Using Line Transect Data Journal of Modern Applied Statistical Methods Volume 12 Issue 2 Article 21 11-1-2013 Akaike Information Criterion to Select the Parametric Detection Function for Kernel Estimator Using Line Transect Data

More information

September Math Course: First Order Derivative

September Math Course: First Order Derivative September Math Course: First Order Derivative Arina Nikandrova Functions Function y = f (x), where x is either be a scalar or a vector of several variables (x,..., x n ), can be thought of as a rule which

More information

Density estimators for the convolution of discrete and continuous random variables

Density estimators for the convolution of discrete and continuous random variables Density estimators for the convolution of discrete and continuous random variables Ursula U Müller Texas A&M University Anton Schick Binghamton University Wolfgang Wefelmeyer Universität zu Köln Abstract

More information

Density Estimation (II)

Density Estimation (II) Density Estimation (II) Yesterday Overview & Issues Histogram Kernel estimators Ideogram Today Further development of optimization Estimating variance and bias Adaptive kernels Multivariate kernel estimation

More information

Non-parametric Inference and Resampling

Non-parametric Inference and Resampling Non-parametric Inference and Resampling Exercises by David Wozabal (Last update 3. Juni 2013) 1 Basic Facts about Rank and Order Statistics 1.1 10 students were asked about the amount of time they spend

More information

Real Analysis Prof. S.H. Kulkarni Department of Mathematics Indian Institute of Technology, Madras. Lecture - 13 Conditional Convergence

Real Analysis Prof. S.H. Kulkarni Department of Mathematics Indian Institute of Technology, Madras. Lecture - 13 Conditional Convergence Real Analysis Prof. S.H. Kulkarni Department of Mathematics Indian Institute of Technology, Madras Lecture - 13 Conditional Convergence Now, there are a few things that are remaining in the discussion

More information

From Histograms to Multivariate Polynomial Histograms and Shape Estimation. Assoc Prof Inge Koch

From Histograms to Multivariate Polynomial Histograms and Shape Estimation. Assoc Prof Inge Koch From Histograms to Multivariate Polynomial Histograms and Shape Estimation Assoc Prof Inge Koch Statistics, School of Mathematical Sciences University of Adelaide Inge Koch (UNSW, Adelaide) Poly Histograms

More information

ARE211, Fall 2004 CONTENTS. 4. Univariate and Multivariate Differentiation (cont) Four graphical examples Taylor s Theorem 9

ARE211, Fall 2004 CONTENTS. 4. Univariate and Multivariate Differentiation (cont) Four graphical examples Taylor s Theorem 9 ARE211, Fall 24 LECTURE #18: TUE, NOV 9, 24 PRINT DATE: DECEMBER 17, 24 (CALCULUS3) CONTENTS 4. Univariate and Multivariate Differentiation (cont) 1 4.4. Multivariate Calculus: functions from R n to R

More information

Statistical Data Analysis

Statistical Data Analysis DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

Northwestern University Department of Electrical Engineering and Computer Science

Northwestern University Department of Electrical Engineering and Computer Science Northwestern University Department of Electrical Engineering and Computer Science EECS 454: Modeling and Analysis of Communication Networks Spring 2008 Probability Review As discussed in Lecture 1, probability

More information

ESTIMATORS IN THE CONTEXT OF ACTUARIAL LOSS MODEL A COMPARISON OF TWO NONPARAMETRIC DENSITY MENGJUE TANG A THESIS MATHEMATICS AND STATISTICS

ESTIMATORS IN THE CONTEXT OF ACTUARIAL LOSS MODEL A COMPARISON OF TWO NONPARAMETRIC DENSITY MENGJUE TANG A THESIS MATHEMATICS AND STATISTICS A COMPARISON OF TWO NONPARAMETRIC DENSITY ESTIMATORS IN THE CONTEXT OF ACTUARIAL LOSS MODEL MENGJUE TANG A THESIS IN THE DEPARTMENT OF MATHEMATICS AND STATISTICS PRESENTED IN PARTIAL FULFILLMENT OF THE

More information

Answers for Calculus Review (Extrema and Concavity)

Answers for Calculus Review (Extrema and Concavity) Answers for Calculus Review 4.1-4.4 (Extrema and Concavity) 1. A critical number is a value of the independent variable (a/k/a x) in the domain of the function at which the derivative is zero or undefined.

More information

1.5 Approximate Identities

1.5 Approximate Identities 38 1 The Fourier Transform on L 1 (R) which are dense subspaces of L p (R). On these domains, P : D P L p (R) and M : D M L p (R). Show, however, that P and M are unbounded even when restricted to these

More information

arxiv: v1 [stat.me] 17 Jan 2008

arxiv: v1 [stat.me] 17 Jan 2008 Some thoughts on the asymptotics of the deconvolution kernel density estimator arxiv:0801.2600v1 [stat.me] 17 Jan 08 Bert van Es Korteweg-de Vries Instituut voor Wiskunde Universiteit van Amsterdam Plantage

More information

arxiv: v2 [stat.me] 13 Sep 2007

arxiv: v2 [stat.me] 13 Sep 2007 Electronic Journal of Statistics Vol. 0 (0000) ISSN: 1935-7524 DOI: 10.1214/154957804100000000 Bandwidth Selection for Weighted Kernel Density Estimation arxiv:0709.1616v2 [stat.me] 13 Sep 2007 Bin Wang

More information

Bandwith selection based on a special choice of the kernel

Bandwith selection based on a special choice of the kernel Bandwith selection based on a special choice of the kernel Thomas Oksavik Master of Science in Physics and Mathematics Submission date: June 2007 Supervisor: Nikolai Ushakov, MATH Norwegian University

More information

DS-GA 1002 Lecture notes 2 Fall Random variables

DS-GA 1002 Lecture notes 2 Fall Random variables DS-GA 12 Lecture notes 2 Fall 216 1 Introduction Random variables Random variables are a fundamental tool in probabilistic modeling. They allow us to model numerical quantities that are uncertain: the

More information

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.

More information

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest

More information

Foundations of Analysis. Joseph L. Taylor. University of Utah

Foundations of Analysis. Joseph L. Taylor. University of Utah Foundations of Analysis Joseph L. Taylor University of Utah Contents Preface vii Chapter 1. The Real Numbers 1 1.1. Sets and Functions 2 1.2. The Natural Numbers 8 1.3. Integers and Rational Numbers 16

More information

Chapter 2: Resampling Maarten Jansen

Chapter 2: Resampling Maarten Jansen Chapter 2: Resampling Maarten Jansen Randomization tests Randomized experiment random assignment of sample subjects to groups Example: medical experiment with control group n 1 subjects for true medicine,

More information

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive

More information

Density Estimation. We are concerned more here with the non-parametric case (see Roger Barlow s lectures for parametric statistics)

Density Estimation. We are concerned more here with the non-parametric case (see Roger Barlow s lectures for parametric statistics) Density Estimation Density Estimation: Deals with the problem of estimating probability density functions (PDFs) based on some data sampled from the PDF. May use assumed forms of the distribution, parameterized

More information

Relationships Between Quantities

Relationships Between Quantities Algebra 1 Relationships Between Quantities Relationships Between Quantities Everyone loves math until there are letters (known as variables) in problems!! Do students complain about reading when they come

More information

A VERY BRIEF REVIEW OF MEASURE THEORY

A VERY BRIEF REVIEW OF MEASURE THEORY A VERY BRIEF REVIEW OF MEASURE THEORY A brief philosophical discussion. Measure theory, as much as any branch of mathematics, is an area where it is important to be acquainted with the basic notions and

More information

Positive data kernel density estimation via the logkde package for R

Positive data kernel density estimation via the logkde package for R Positive data kernel density estimation via the logkde package for R Andrew T. Jones 1, Hien D. Nguyen 2, and Geoffrey J. McLachlan 1 which is constructed from the sample { i } n i=1. Here, K (x) is a

More information

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) D. ARAPURA This is a summary of the essential material covered so far. The final will be cumulative. I ve also included some review problems

More information

Introduction to Regression

Introduction to Regression Introduction to Regression Chad M. Schafer May 20, 2015 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression Nonparametric Procedures Cross Validation Local Polynomial Regression

More information

A tailor made nonparametric density estimate

A tailor made nonparametric density estimate A tailor made nonparametric density estimate Daniel Carando 1, Ricardo Fraiman 2 and Pablo Groisman 1 1 Universidad de Buenos Aires 2 Universidad de San Andrés School and Workshop on Probability Theory

More information

Review: Limits of Functions - 10/7/16

Review: Limits of Functions - 10/7/16 Review: Limits of Functions - 10/7/16 1 Right and Left Hand Limits Definition 1.0.1 We write lim a f() = L to mean that the function f() approaches L as approaches a from the left. We call this the left

More information

Recall the Basics of Hypothesis Testing

Recall the Basics of Hypothesis Testing Recall the Basics of Hypothesis Testing The level of significance α, (size of test) is defined as the probability of X falling in w (rejecting H 0 ) when H 0 is true: P(X w H 0 ) = α. H 0 TRUE H 1 TRUE

More information

Log-Density Estimation with Application to Approximate Likelihood Inference

Log-Density Estimation with Application to Approximate Likelihood Inference Log-Density Estimation with Application to Approximate Likelihood Inference Martin Hazelton 1 Institute of Fundamental Sciences Massey University 19 November 2015 1 Email: m.hazelton@massey.ac.nz WWPMS,

More information

I. ANALYSIS; PROBABILITY

I. ANALYSIS; PROBABILITY ma414l1.tex Lecture 1. 12.1.2012 I. NLYSIS; PROBBILITY 1. Lebesgue Measure and Integral We recall Lebesgue measure (M411 Probability and Measure) λ: defined on intervals (a, b] by λ((a, b]) := b a (so

More information

2tdt 1 y = t2 + C y = which implies C = 1 and the solution is y = 1

2tdt 1 y = t2 + C y = which implies C = 1 and the solution is y = 1 Lectures - Week 11 General First Order ODEs & Numerical Methods for IVPs In general, nonlinear problems are much more difficult to solve than linear ones. Unfortunately many phenomena exhibit nonlinear

More information

Calculus at Rutgers. Course descriptions

Calculus at Rutgers. Course descriptions Calculus at Rutgers This edition of Jon Rogawski s text, Calculus Early Transcendentals, is intended for students to use in the three-semester calculus sequence Math 151/152/251 beginning with Math 151

More information

arxiv: v1 [stat.me] 11 Sep 2007

arxiv: v1 [stat.me] 11 Sep 2007 Electronic Journal of Statistics Vol. 0 (0000) ISSN: 1935-7524 arxiv:0709.1616v1 [stat.me] 11 Sep 2007 Bandwidth Selection for Weighted Kernel Density Estimation Bin Wang Mathematics and Statistics Department,

More information

Statistics for Python

Statistics for Python Statistics for Python An extension module for the Python scripting language Michiel de Hoon, Columbia University 2 September 2010 Statistics for Python, an extension module for the Python scripting language.

More information

STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN

STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN Massimo Guidolin Massimo.Guidolin@unibocconi.it Dept. of Finance STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN SECOND PART, LECTURE 2: MODES OF CONVERGENCE AND POINT ESTIMATION Lecture 2:

More information

Modern Physics Part 2: Special Relativity

Modern Physics Part 2: Special Relativity Modern Physics Part 2: Special Relativity Last modified: 23/08/2018 Links Relative Velocity Fluffy and the Tennis Ball Fluffy and the Car Headlights Special Relativity Relative Velocity Example 1 Example

More information

Supporting Australian Mathematics Project. A guide for teachers Years 11 and 12. Probability and statistics: Module 25. Inference for means

Supporting Australian Mathematics Project. A guide for teachers Years 11 and 12. Probability and statistics: Module 25. Inference for means 1 Supporting Australian Mathematics Project 2 3 4 6 7 8 9 1 11 12 A guide for teachers Years 11 and 12 Probability and statistics: Module 2 Inference for means Inference for means A guide for teachers

More information

Automatic Differentiation and Neural Networks

Automatic Differentiation and Neural Networks Statistical Machine Learning Notes 7 Automatic Differentiation and Neural Networks Instructor: Justin Domke 1 Introduction The name neural network is sometimes used to refer to many things (e.g. Hopfield

More information

Experiment 2 Random Error and Basic Statistics

Experiment 2 Random Error and Basic Statistics PHY9 Experiment 2: Random Error and Basic Statistics 8/5/2006 Page Experiment 2 Random Error and Basic Statistics Homework 2: Turn in at start of experiment. Readings: Taylor chapter 4: introduction, sections

More information

MATH 1231 MATHEMATICS 1B CALCULUS. Section 5: - Power Series and Taylor Series.

MATH 1231 MATHEMATICS 1B CALCULUS. Section 5: - Power Series and Taylor Series. MATH 1231 MATHEMATICS 1B CALCULUS. Section 5: - Power Series and Taylor Series. The objective of this section is to become familiar with the theory and application of power series and Taylor series. By

More information

3 Nonparametric Density Estimation

3 Nonparametric Density Estimation 3 Nonparametric Density Estimation Example: Income distribution Source: U.K. Family Expenditure Survey (FES) 1968-1995 Approximately 7000 British Households per year For each household many different variables

More information

Adaptive Kernel Estimation of The Hazard Rate Function

Adaptive Kernel Estimation of The Hazard Rate Function Adaptive Kernel Estimation of The Hazard Rate Function Raid Salha Department of Mathematics, Islamic University of Gaza, Palestine, e-mail: rbsalha@mail.iugaza.edu Abstract In this paper, we generalized

More information

Analysis methods of heavy-tailed data

Analysis methods of heavy-tailed data Institute of Control Sciences Russian Academy of Sciences, Moscow, Russia February, 13-18, 2006, Bamberg, Germany June, 19-23, 2006, Brest, France May, 14-19, 2007, Trondheim, Norway PhD course Chapter

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

The Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee

The Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee The Learning Problem and Regularization 9.520 Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Due Thursday, September 19, in class What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

Metric spaces and metrizability

Metric spaces and metrizability 1 Motivation Metric spaces and metrizability By this point in the course, this section should not need much in the way of motivation. From the very beginning, we have talked about R n usual and how relatively

More information

MATH 1231 MATHEMATICS 1B CALCULUS. Section 4: - Convergence of Series.

MATH 1231 MATHEMATICS 1B CALCULUS. Section 4: - Convergence of Series. MATH 23 MATHEMATICS B CALCULUS. Section 4: - Convergence of Series. The objective of this section is to get acquainted with the theory and application of series. By the end of this section students will

More information