ON SOME TWO-STEP DENSITY ESTIMATION METHOD

Size: px
Start display at page:

Download "ON SOME TWO-STEP DENSITY ESTIMATION METHOD"

Transcription

1 UNIVESITATIS IAGELLONICAE ACTA MATHEMATICA, FASCICULUS XLIII 2005 ON SOME TWO-STEP DENSITY ESTIMATION METHOD by Jolanta Jarnicka Abstract. We introduce a new two-step kernel density estimation method, based on the EM algorithm and the generalized kernel density estimator. The accuracy obtained is better in particular, in the case of multimodal or skewed densities.. Introduction. Density estimation has been investigated in many papers. In particular, the kernel estimation method, introduced in [], has received much attention. Let us recall some problems connected with the kernel density estimation. Let X i : Ω, i =,..., n, be a sequence of random variables defined on a probability space Ω, F, P. Suppose that they are identically and independently distributed with an unknown density f, and that {x i } is a sequence of corresponding observations. The kernel density estimator is characterized by two components: the bandwidth hn and the kernel K. Definition.. Let {hn} n= 0, + with hn 0. Let K : be a measurable nonnegative function. The kernel density estimator is given by the formula. fh x = n x xi K. nhn hn We call h the bandwidth window width or smoothing parameter. We call K the kernel. Statistical properties of this estimator, like biasedness, asymptotical unbiasedness and efficiency, were presented in [], [0], and [3]. i=

2 68 The kernel determines the regularity, symmetry about zero and shape of the estimator, while the bandwidth the amount of smoothing. In particular, f h is a density, provided that K 0 and Kt dt =. Considerable research has been carried out on the question of how one should select K in order to optimize properties of the kernel density estimator f h see e.g. [5], [4], and [3] and numerous examples have been discussed. Suggested choice is a density, symmetric about zero, with finite variance, like e.g. the Gaussian kernel Kt = 2π e 2 x2, the Epanechnikov kernel Kt = x2 for x < 5 or the rectangular kernel Kt = 2 for x < see e.g. [3] or [4] for more examples. But in some situations nonsymmetric kernels e.g. the exponential kernel Kt = e x for x > 0 guarantee better results. A comparative study of this problem was carried out in [7]. The main problem of the kernel density estimation is still the choice of the bandwidth hn. Various measures of accuracy of the estimator f h have been used to obtain the optimal bandwidth see e.g. [3], [3]. We will here focus on the mean integrated squared error given by MISE f h = E f h x fx 2 dx, and its asymptotic version for hn 0 and nhn as n, often abbreviated by AMISE f h see [5], [8]. The mean integrated squared error can be written as a sum of integrated squared bias and integrated variance of f h : MISE f h = E f h x fx 2 dx + V f h x dx. We want to keep both the bias and the variance possibly small, so the optimal bandwidth hn should be chosen so that MISE is minimal. Under additional assumptions on the density f we assume that f is twice differentiable and that 2 f x dx < and for a kernel being a symmetric density with a finite variance, we can provide asymptotic approximations for the bias and the variance of f h see [3]. Having obtained the formula for the asymptotic mean integrated squared error, we get the optimal bandwidth hn in the following form:.2 h opt n = K2 t dt 2 2 t2 Kt dt f x dx 5 n 5.

3 69 Although the integrals K2 t dt and t2 Kt dt can be calculated if K is known, formula.2 depends on an unknown density f and hence, in practice, the ability to calculate h opt n depends on obtaining an estimate for 2 f x dx. Several ways to solve this problem have appeared in the literature. One of the simplest methods, suggested in [3] often called the Silverman s rule of thumb, is to assume that f is known. For example assuming that f is the density of a normal distribution with parameters µ and σ 2, and using the Gaussian kernel, we obtain h opt n =.06σn 5, where σ can be estimated by the standard deviation. Other best known methods beside those giving automatic procedures for selecting the bandwidth, like the cross-validation see e.g. [3] are based on the idea of constructing a pilot function and then using it in actual estimation see e.g. [3], [9], [], [2], [5], and [6]. The problem is that none of these methods guarantees good results for a wide class of estimated density functions. In this paper we propose a new two-step kernel density estimation method, which generalizes methods considered in [3] and [6]. Its practical application is in fact based on a solution of some optimization problem. In Section 2 we present the idea of the two-step method and prove some basic properties of the proposed generalized kernel density estimator. Section 3 is dedicated to the problem of the choice of optimal bandwidths {} m j=. In Section 4 we propose an algorithm which enables us to construct a pilot. The last section contains a discussion of an example, which presents possibilities of our method. We emphasize that we have carried a lot of experiments and computer simulations in order to check the efficiency of the new method in comparison with the methods mentioned. According to the statistical analysis, we can state that in some cases e.g. when estimating bimodal or skewed densities our method gives better results than other known methods and in numerous cases its efficiency is comparable to the one of the methods mentioned see [7]. 2. The idea of the two-step method. Let Ω, F, P be a probability space, X i : Ω i =, 2,..., n a sequence of random variables and suppose they are independently and identically distributed. We assume that f is an unknown density function of X i and {x i } n i= is the sequence of observations on X i. Consider a family of functions {φ j x} m j=, such that 0 φ j x, φ j x =, x, j=

4 70 and a sequence of corresponding bandwidths {} m j=, such that h jn > 0 and 0, j =,..., m. Assume also that the kernel K : is a nonnegative measurable function. Definition 2.. We define the generalized kernel density estimator as a function given by 2. f φ x = n n i= j= φ j x i x K xi. The proposed method consists of two steps: We choose a pilot function f 0, by parametric estimation, using the EM algorithm see Section 4., under the assumption that f 0 is given as follows f 0 x = j α j f j x, where f j x = 2 e x µj 2 σ j, j =,..., m. 2πσj 2 Considering the generalized kernel density estimator nonparametric estimation we choose a sequence of bandwidths > 0, 0, for n, j =,..., m, using the pilot calculated in step, by taking 2.2 φ j x = α jf j x f 0 x = α j f j x m k= α kf k x. emark 2.2. The first modification with respect to the traditional kernel estimation method is the fact that we choose a sequence of bandwidths for j =,..., m instead of the single value hn. This may cause some complication in calculations but guarantees a better accuracy of the estimator. emark 2.3. Formula 2. corresponds to the methods described in [], [3] and in [5]. The idea of the construction of the pilot function by parametric estimation comes from [6]. emark 2.4. Note that for m =, taking φ x, we get the kernel density estimator., with the bandwidth h n. emark 2.5. As in the case of the kernel density estimator., the kernel K has an essential effect on the properties of f φ. In particular, if K is a density, and so Kx 0 and Kx dx =, then also f φ x 0 and

5 7 f φ x dx = = n n n n i= j= i= j= φ j y φ j y x y K dx x y K dx. Putting x y = t, for a fixed j, gives f φ x dx = Kt dt =. 2.. Properties of the generalized kernel density estimator. We will now present some statistical properties of f φ and then use its asymptotic behaviour in a criterion for the choice of, j =,..., m. Then Under the above assumptions and notation, the following lemma is true. Lemma 2.6. Assume that E φ j x i K x xi < and x xi x xi E φ j x i φ k x i K K <, j, k =,..., m. h k n E f φ x = V f φ x = n n j= n j,k= m j= Kyφ j x + yfx + y dy, K h k n x y K x y φ j yφ k yfy dy h k n Kyφ j x + yfx + y dy 2. Proof. Fix an n N. Since K is a measurable nonnegative function and {X i } n i= { is a sequence } of identically and independently distributed random variables, i n φj x K x xi forms a sequence of identically and independently i= distributed random variables. Thus E f φ x = n x n E xi φ j x i K = i= j= j= E φ j x i K x xi.

6 72 Observe that, for every j =,..., m E φ j x i K Letting y = x t, we have dy dt = x xi = K and x t φ j tft dt. Kyφ j x + yfx + y dy = Kyφ j x + yfx + y dy. Hence, summing both sides over j =,..., m, we get the expected value of the generalized kernel estimator E f φ x = j= Kyφ j x + yfx + y dy. To prove 2.4, we use the following equation There is V f φ x = n 2 n i= = n 2 n V f φ x = E f 2 φ x E f φ x 2. E [ m E i= [ m j= j= j= m E φ j x i x K xi 2 φ j x i x K xi j= ] 2 φ j x i x K xi m φ k x i x h k n K xi h k n k= E φ j x i K x xi ] 2.

7 73 Thanks to the independence of x i, i =,..., n, the first term can be written as: n m φ j x i x n 2 E h i= j= j n K xi m φ k x i x h k n K xi h k n k= m m = n = n = n j= j= k= j= k= φ j y K x y k= φ j yφ k y h k n K x y h k n K φ k y x y h k n K h k n K x y fy dy h k n fy dy x y x y K φ j yφ k yfy dy. h k n Hence the variance of the generalized kernel estimator is given by V f φ x = n n x y x y K K φ j yφ k yfy dy h k n h k n m 2, Kyφ j x + yfx + y dy n j,k= which completes the proof. j= emark 2.7. Taking m = and φ x in Theorem 2.6, we obtain the formulae for the expected value and variance of the traditional kernel estimator f h. They can be found for example in [3]. In order to show some statistical properties of fφ, we use the following lemma see [0] for a similar result for the traditional kernel estimator. Lemma 2.8. Under the above assumptions, let K be a bounded and integrable function such that lim t tkt = 0, and let > 0, 0, for n and j =,..., m. Then for every point x of continuity of f there is j= E φ j x i K x xi fx Kt dt, n. Proof. Our goal is to show that for every j =,..., m fx + zφ j x + z fxφ j x + zkz dz = 0. lim n

8 74 For a fixed j, there is fx + zφ j x + z fxφ j x + zkz dz fx + zφ j x + z fxφ j x + zkz dz. Letting ζ = z, we obtain where dz dζ fx ζφ j x ζ fxφ j x ζk = =, and ζ fx ζφ j x ζ fxφ j x ζk dζ fx ζφ j x ζ fxφ j x ζk ζ = fx ζ fx φ j x ζ K ζ dζ, ζ, for K is nonnegative. Observe that φ j x ζ, whenever x ζ. Therefore, fx ζ fx φ j x ζ K ζ dζ fx ζ fx K ζ dζ. Take an arbitrary δ > 0. Splitting the integration interval into two parts: {ζ : ζ δ} and {ζ : ζ < δ} and applying the triangle inequality to the first term, we get {ζ: ζ δ} fx ζ fx {ζ: ζ δ} K ζ fx ζ K + dζ {ζ: ζ δ} ζ dζ fx K ζ dζ.

9 75 Since K is bounded and f is a density, we get fx ζ ζk ζ dζ ζ {ζ: ζ δ} sup y δ and fxk ζ dζ = {ζ: ζ δ} fx = fx y δ yky fx ζ dζ {ζ: ζ δ} z δ Hence, for n, the first term tend to 0: fx ζ fx K ζ dζ {ζ: ζ δ} sup yky fx ζ dζ + fx {ζ: ζ δ} {ζ: ζ δ} Kz dz. z δ K ζ dζ Kz dz 0. From the continuity of f at x, the second integral may be estimated as follows fx ζ fx K ζ dζ {ζ: ζ <δ} ε δ K ζ dζ = ε δ Ky dy {ζ: ζ <δ} {y: y < δ } and converges to zero as n. Thus, for every j =,..., m, x E xi φ j x i K fxφ j x Kt dt, for n. After summing both sides over j the proof is completed. emark 2.9. By the additional assumption that Kt dt =, E f φ x fx 0, n, and hence the generalized kernel estimator is asymptotically unbiased. Provided that for every j =,..., m, n for n, we obtain V f φ 0 for n.

10 76 3. Criterion for the bandwidth choice. Like in the case of the traditional kernel estimator., we will use asymptotic mean integrated squared error AMISE f φ as a criterion for the choice of > 0, 0, for n, j =,..., m. We obtain the formula for AMISE f φ by asymptotic approximations of E f φ x fx and V f φ. We will use a well known corollary of Taylor s formula: Corollary 3.. Suppose that gx is an arbitrary function, n -times differentiable in a neighbourhood of x 0 and that g n x 0 at x 0 exists. Then ε>0 δ>0, 0<θ<δ : gx 0 + θ gx 0 θg x 0 θn n! gn x 0 < θn n! ε. Under the assumption from the last section we will formulate the criterion for the choice of the sequence of bandwidths. From now on, we assume that K is a measurable nonnegative function such that t 2 Kt dt < and Kt dt =. We will consider two cases: for a symmetric and nonsymmetric kernel, starting with the symmetric case. Theorem 3.2. Under the assumptions from the last section, suppose that functions φ j f, j =,..., m are differentiable in a neighbourhood of x and there exist derivatives φ j xfx, j =,..., m at x, and a kernel K satisfies tkt dt = 0. Then E f φ x = fx + 2 V f φ x = n + o j,k= 2 φ j xfx j= h k n. n j h jn K t 2 Kt dt + o 2, x y x y K φ j yφ k yfy dy h k n Proof. Let ε > 0. By Corollary 3., for every j =,..., m, there exists a δ > 0 such that 0 < t < δ and φ j x + tfx + t φ j xfx tφ j xfx 2 h jn 2 t 2 φ j xfx < h jn 2 t 2 ε. 2! j

11 77 Multiplying inequalities h jn 2 t 2 ε < φ j x + tfx + t φ j xfx tφ j xfx 2! 2 h jn 2 t 2 φ j xfx < h jn 2 t 2 ε 2! by Kt, and integrating with respect to t, we obtain h jn 2 ε 2! t 2 Kt dt < φ j x + tfx + tkt dt φ j xfx 2 h jn 2 φ j xfx t 2 Kt dt < h jn 2 ε t 2 Kt dt, 2! provided that K is a symmetric density. Summing over j =,..., m, by 2.3, we get 4 j= Therefore, 2 ε t 2 Kt dt < E ˆf φ x fx 2 E f φ x fx 2 2 φ j xfx j= 2 φ j xfx j= < 4 < 4 t 2 Kt dt 2 ε j= t 2 Kt dt j= t 2 Kt dt. 2 ε t 2 Kt dt. Since > 0, j =,..., m, denoting µ 2 = t2 Kt dt, we obtain E f φ x fx m j= h jn 2 2 φ jxfx µ 2 < 4 εµ 2. Observe that k h k n 2 m j= h jn 2 and 0, n for j =,..., m. Since ε > 0 is arbitrary, we get E f φ x fx 2 φ m jxfx µ 2 = o 2. j=

12 78 Now, since V f φ x = n 2 n i= m E j= φ j x i x K xi m φ k x i x h k n K xi h k n k= [ m j= E φ j x i K x xi ] 2, applying Lemma 2.8 and emark 2.9, we conclude that the second term is. By 2.4, the first term equals o n j h jn n j,k= K h k n which completes the proof. x y K x y φ j yφ k yfy dy, h k n In the nonsymmetric case, we proceed in the following way. Theorem 3.3. Under the assumptions from the last section, if the functions fφ j are differentiable at x, for j =,..., m and tkt dt 0, and tkt dt <, then E f φ x = fx + V f φ x = n +o j,k= φ j xfx j= h k n. n j h jn K x y tkt dt + o, K j x y φ j yφ k yfy dy h k n Proof. Let ε > 0 be arbitrary. By the Peano formula, for every j =,..., m, there exists a δ > 0, 0 < t < δ such that: φ j x + tfx + t φ j xfx tφ j xfx < tε. Multiplying by Kt, and integrating with respect to t, we get by assumptions on K, φ j x + tfx + tkt dt φ j xfx φ j xfx µ where µ = tkt dt. < εµ,

13 79 Summing over j =,..., m, by 2.3, we obtain E f φ x fx φ j xfx m µ < εµ. j= j= Therefore, E f φ x fx m j= h jn φ j xfx µ < εµ, since for every j, 0, n and hence j h jn is bounded, so E f φ x fx = µ φ j xfx = o. j= In the second part of the proof we can proceed as in the symmetric case see the proof of Theorem 3.2. Under the above assumptions, the following corollaries are true. Corollary 3.4. By Theorem 3.2, if K is symmetric and φ j xfx φ k xfx dx <, for every j, k =,..., m, then the asymptotic mean integrated squared error is given by: 3. AMISE f φ = + n j= k= h k n where µ 2 = t2 Kt dt. j= k= 4 h2 jnh 2 k nµ2 2 φ j xfx φ k xfx dx K t t K dt φ j yφ k yfy dy, h k n Corollary 3.5. By Theorem 3.3, if K is nonsymmetric and φ j xfx φ k xfx dx <, j

14 80 for every j, k =,..., m, then the asymptotic mean integrated squared error is given by 3.2 AMISE f φ = h k nµ 2 φ j xfx φ k xfx dx + n j= k= h k n where µ = tkt dt. j= k= K t t K dt φ j yφ k yfy dy, h k n Unfortunately the complexity of formulae 3. and 3.2 prevent the derivation of the explicit formulae for, j =,..., m, hence the choice of the sequence of bandwidths should be done numerically, by minimization of AMISE, for a symmetric and nonsymmetric kernel, respectively. Obviously the above formulae get simpler if we know the kernel. For example, in the case of the Gaussian kernel and for fixed j, k, we get µ 2 = t2 K g t dt = and Therefore, AMISE f φ = t K g K g j= k= t h k n dt = 2π 4 h2 jnh 2 k n e t = h k n 2π hj n 2 + h k n πn hj n 2 + h k n 2 h k n 2 dt φ j xfx φ k xfx dx φ j xφ k xfx dx. In case of the rectangular kernel, for fixed j, k, µ 2 = t2 K p t dt = 3 and t t K p K p dt = h k n 2 min {h jn, h k n}, and hence AMISE f φ = j= k= 36 h2 jnh 2 k n + min {h jn, h k n} 2nh k n φ j xfx φ k xfx dx φ j xφ k xfx dx.

15 8 Using the nonsymmetric exponential kernel, we obtain AMISE f φ = h k n φ j xfx φ k xfx dx j= k= + h k n n + h k n since for fixed j, k, there is µ = tk wt dt = t t + K w K w dt = e t h k n φ j xφ k xfx dx, h k n 2 dt = h jnh k n + h k n. emark 3.6. If K is symmetric, taking m = and φ x see emark 2.4, by 3., we obtain: AMISE f φ = h n 4 2 t 2 Kt dt φ xfx 2 dx 4 + K 2 t dt φ x 2 fx dx nh n = h n 4 2 t 2 Kt dt fx 2 dx + K 2 t dt, 4 nh n which gives the formula for AMISE for the traditional kernel estimator.. emark 3.7. Using nonsymmetric kernel, for m =, and φ x, by 3.2, we get the formula for AMISE for estimator.: 2 AMISE f φ = h n 2 tkt dt φ xfx 2 dx + K 2 t dt φ x 2 fx dx nh n 2 = h n 2 tkt dt fx 2 dx + K 2 t dt nh n see [7] for results on the nonsymmetric case in the traditional kernel estimation. In Section 4 we will construct the pilot function by parametric estimation. The main problem is the choice of number of components in the model α f x θ + α 2 f 2 x θ α m f m x θ m. This is a known and much investigated problem in many methods of parametric estimation. However, in our case, it is not so important, since the pilot is just

16 82 a tool for further estimation. So we suggest the choice of m = 2, because it does not decrease the efficiency of the method considered. emark 3.8. In the case m = 2, we choose two bandwidths h n and h 2 n, minimizing: for the Gaussian kernel: 3.3 AMISE f φ = 2 h2 nh 2 2n φ xfx φ 2 xfx dx + 4 h4 n φ xfx 2 dx + 4 h4 2n φ 2 xfx 2 dx + 2nh n φ 2 π xfx dx + 2nh 2 n φ 2 π 2xfx dx 2 + n πh 2 n + h2 2 n φ xφ 2 xfx dx, for the rectangular kernel: AMISE f φ = 8 h2 nh 2 2n φ xfx φ 2 xfx dx + 36 h4 n φ xfx 2 dx + 36 h4 2n φ 2 xfx 2 dx + φ 2 2nh n xfx dx + φ 2 2nh 2 n 2xfx dx + min {h n, h 2 n} φ xφ 2 xfx dx, nh nh 2 n for the exponential kernel: AMISE f φ = h 2 n φ xfx 2 dx + h 2 2n φ 2 xfx 2 dx + h n φ 2 2n xfx dx + h 2n φ 2 2n 2xfx dx + 2h nh 2 n φ xfx φ 2 xfx dx h nh 2 n + 2 φ xφ 2 xfx dx. nh n + h 2 n 4. Construction of the pilot function parametric estimation. In this section we present some way of construction of the pilot function, based on a modification of the well-known maximum likelihood method. We apply the Expectation Maximization Algorithm, introduced in [2] and used to estimate

17 83 the parameters in stochastic models, to hidden Markov models, to estimate a hazard function, as well as to estimate the parameters in mixture models. The last of these applications is the matter of our interest and will be used in the normal mixture model α Nµ, σ 2 + α 2 Nµ 2, σ α m Nµ m, σ 2 m. 4.. The EM Algorithm. Let x = x, x 2,..., x n be a sample of independent observations from an absolutely continuous distribution with a density function f X x θ, where θ denotes the parameters of the distribution. Definition 4.. We define the likelihood function and the log-likelihood function L x θ x := n f X x i θ, i= l x θ x := ln L x θ x = n ln f X x i θ. i= The maximum likelihood estimator of θ is given by θ = argmax θ Θ L x θ x = argmax l x θ x. θ Θ To present the EM algorithm we consider data model 4. {x i, y i, i =,..., n}, from the distribution of the random variable X, Y, where x = x,..., x n is a sample of independent observations from the absolutely continuous distribution of X, with the density f X treated as a marginal density of X, Y, and y = y,..., y n denotes the latent or missing data. Sample x, together with y is called the complete data. Definition 4.2. For data model 4., the likelihood and log-likelihood functions are defined as follows: n Lθ x, y = fx i, y i θ, i= lθ x, y = ln Lθ x, y = n ln fx i, y i θ. i=

18 84 From now on we will use the following notation n 4.4 f X x θ = f X x i θ, fx, y θ = f Y X y x, θ = i= n fx i, y i θ, i= n f Y X y i x i, θ. i= Corollary 4.3. For the considered data model, the maximum likelihood estimator of θ is given by θ = argmax l x θ x, y ln f Y X y x, θ, θ Θ where f Y X y x, θ denotes the conditional density of Y, given X = x and θ. Proof. By the definition of conditional density, and hence, by Definition 4., θ = argmax θ Θ = argmax θ Θ f X x θ = fx, y θ f Y X y x, θ, l x θ x = argmaxln fx, y θ ln f Y X y x, θ θ Θ lθ x, y ln f Y X y x, θ Now, let Qθ, θ t := E ln fx, y θ x, θ t, Hθ, θ t := E ln f Y X y x, θ x, θ t, where E x, θ denotes the conditional expected value, given X = x and fixed θ = θ t, where θ t is the value of parameter θ in the iteration t. The EM algorithm gives an iterative procedure that maximizes Qθ, θ t, for every t = 0,, 2,... giving θ t+ = argmax θ Qθ, θ t. After sufficient number of iterations, argmax θ l x θ x will be found with a high probability see [2], [6]. Theorem 4.4. Under the above assumptions the procedure of θ t+ = argmax Qθ, θ t, θ Θ

19 85 guarantees that with equality iff for t = 0,, 2,... Proof. Consider the expression l x θ t+ x l x θ t x, Qθ t+ x, θ t = Qθ t x, θ t, Qθ t+ x, θ t Hθ t+ x, θ t. From formulae 4.7 and 4.8, there follows Qθ t+ x, θ t Hθ t+ x, θ t = E ln fx, y θ t+ x, θ t E ln f Y X y x, θ t+ x, θ t = E ln fx, y θ t+ x, θ t, f Y X y x, θ t+ and by the definition of conditional density and 4.4 Thus l x θ t+ x = ln fx, y θ t+ f Y X y x, θ t+. Qθ t+ x, θ t Hθ t+ x, θ t = El x θ t+ x x, θ t. By the monotonicity of the conditional expected value, l x θ t+ x l x θ t x 0, provided that Qθ t+ x, θ t Qθ t x, θ t + Hθ t x, θ t Hθ t+ x, θ t 0. Note that the first term Qθ t+ x, θ t Qθ t x, θ t 0, by 4.4. The second one, by 4.8 and the definition of the conditional expected value, can be written as Hθ t x, θ t Hθ t+ x, θ t = E ln f Y X y x, θ t x, θ t E ln f Y X y x, θ t+ x, θ t = E ln f Y Xy x, θ t x, θ t = ln f Y Xy x, θ t f Y X y x, θ t+ f Y X y x, θ t+ f Y Xy x, θ t dy. As ln f Y Xy x, θ t f Y X y x, θ t+ = ln + f Y Xy x, θ t+ f Y X y x, θ t f Y X y x, θ t and ln + t < t, for every t >, t 0, the proof is completed.

20 86 The EM algorithm can be stated as follows: Start: Set the initial value θ 0. Each step consists of two substeps. For every t = 0,, 2,... we proceed as follows: Expectation step: Let θ t be the current estimates of the unknown parameters θ. By 4.7, we calculate Qθ, θ t =E ln fx, y θ x, θ t = ln fx, y θf Y X y x, θ t dy = ln fx, y θ fx, y θ t f X x θ t dy. Maximization step: Since f X x θ t is independent of θ, we get θ t+ = argmax Qθ, θ t = argmax ln fx, y θfx, y θ t dy. θ θ By iterating EM steps, the algorithm converges to the parameters that should be the maximum likelihood parameters, but the convergence speed is rather low see e.g. [6]. There is some research to accelerate the convergence speed of the EM algorithm, but the procedure is usually not easy and difficult to carry out. In this paper we will focus on the traditional form of the EM algorithm and apply it to the estimation of a mixture of the normal densities Fitting the mixture of the normal densities. As previously, consider x = x, x 2,..., x n a sample of independent observations from a density f X x θ, where θ = θ,..., θ m denotes the parameters of the distribution and consider the following model: where Suppose that f X is given by the mixture density f X x θ = α j f j x θ j, f j x θ j = j= α j > 0, j =,..., m, α j =, j= 2 e x µ j 2 σ j, θ j = µ j, σj 2. 2πσj We call α j the prior probability of the j-th mixture component, for j =,..., m.

21 87 Suppose also, as in 4., that y = y, y 2,..., y n denotes latent data from the distribution of random variable Y = Y, Y 2,..., Y n, such that P Y i = j = α j, j =,..., m, i =,..., n. Calculate Qθ, θ t for the above model. The conditional density of Y, given X = x and θ t, is given by n 4.9 f Y X y x, θ t = f Y X y i x i, θ t, and the joint density of X and Y 4.0 fx, y θ = i= n fx i, y i θ yi, i= which by the definition of the conditional density gives n 4. fx, y θ = α yi f yi x i θ yi. i= From formulae 4.9, 4. and 4.7, we obtain n 4.2 Qθ, θ t =Eln fx, y θ x, θ t = Eln α yi f yi x i θ yi x, θ t = = j= i= j= i= i= n ln α yi f yi x i θ yi P Y i = j x, θ t n ln α j f j x i θ j P j x i, θ t. Thus, in each iteration t of the EM algorithm, the expectation step, calculates qij t = P j x i, θ t. From now on, we will denote by α j t, µt j and σt j 2 estimators of the parameters α j, µ j, σj 2 in iteration t. Lemma 4.5. Under the above assumptions, α j t = n 4.3 q t n ij, i= n µ t i= j = qt ij x i 4.4, n i= qt ij 4.5 σ t j 2 = n i= x i µ t j 2 qij t n. i= qt ij

22 88 Proof. To find α j, we use the Lagrange multiplier λ, with the constraint of m j= α j =. For j =,..., m, we calculate Qθ, θ t + λ α j m j= By formula 4.2, m n m ln α j f j x i θ j qij t + λ α j and hence j= i= j= α j = 0. n qij t = λα j, j =,..., m. i= α j = 0, j =,..., m, Summing both sides over j, we get λ = n, and thus 4.3. To prove 4.4 and 4.5, we maximize the function n Qθ, θ t = ln α j f j x i θ j qij. t j= i= By the assumptions of the model, 2 f j x θ j = e x µ j 2 σ j, θ j = µ j, σj 2. 2πσj We get the function given by n 4.6 Qµ j, σ j = j= i= lnα j 2 ln2π lnσ j 2 xi µ j 2 q t σ ij. By differentiating 4.6 with respect to µ and σ, for j =,..., m, we obtain Q n = x i µ j q µ ij, t j Q σ j = i= n + x i µ j 2 q σ ij. t j i= Solving the equations n i= x i µ j qij t = 0 and n i= we get 4.4 and 4.5. σ 3 j σj + x i µ j 2 σ 3 j q t ij = 0,

23 89 The scheme of the EM algorithm, for the mixture of normal densities can be stated as follows: Start: We set αj 0, µ0 j i σ0 j 2. For j =, 2,..., m, we iterate t =, 2,...: E-step: We calculate qij t, M-step: We calculate α j t = n q n ij, t i= n µ t i= j = qt ij x i, n i= qt ij σ t j 2 = n i= x i µ t j 2 qij t n. i= qt ij The above scheme with the exact formulae can easily be used to construct the pilot function. 5. Example. In this section we present an example of how to apply the two-step method in practice. Suppose that we are given a sample of 200 independent observations from an absolutely continuous distribution with an unknown density f. In this example we consider the mixture of two Weibull densities, with parameters: α = 0.25, a = 2.9, b = 3.5 and α 2 = 0.75, a 2 =.6, b 2 = 0.6, which is nonsymmetric and bimodal. At first we construct a pilot function as a mixture of m = 2 normal densities f 0 x = α f x µ, σ 2 + α 2 f 2 x µ 2, σ 2 2, where f j x µ, σ 2 = e x µ j 2 2 σ j 2, j =, 2, 2πσj using the EM algorithm. We set the initial values of the estimates of the parameters: α 0 = 0.5, α 2 0 = 0.5, µ 0 = 0.5, µ 0 2 = 0.5, σ 0 2 = 0.5, σ = 0.5, and after 30 iterations of the EM algorithm we get α = 0.35, µ = 2.698, σ 2 =.560, α 2 = 0.685, µ 2 = 0.489, σ 2 2 =

24 90 Therefore, the pilot is given by the formula f 0 x = α x µ 2 α 2π σ 2 e 2 σ x µ 2 2 2π σ2 2 e 2 σ 2 2. In the second step we use the generalized kernel density estimator 2. with the Gaussian kernel. In this case φ x = α x µ 2 2π σ e 2 σ 2 2 f 0 x, φ 2 x = α x µ π σ2 e 2 σ f 0 x We obtain the bandwidths h and h 2 minimizing the AMISE f φ given by formula 3.3. By numerical minimization, we get h = and h 2 = 0.0. Hence, we have the density estimator given by the formula f φ x = φ x i= h x xi K h + φ 2x x xi K, h 2 h 2 where x,..., x 200 denote the sample given, and φ x, φ 2 x, h, h 2 are calculated above. To show the results of the estimation, we present the graphs of the estimator f φ and the estimated density in the picture below x One can see that the estimator f φ solid line is very close to the estimated density f dotted line, which gives evidence to a high efficiency of the method proposed.

25 9 eferences. Abramson I.S., Arbitrariness of the Pilot Estimator in Adapative Kernel Methods, J. Multivariate Anal., 2 982, Dempster A.P., Laird N.M., ubin D.B., Maximum-likelihood from Incomplete Data via EM Algorithm, J. oy. Statist. Soc. Ser. B 977, Devroye L., Gjörfi L., Nonparametric Density Estimation: the L view, Willey, New York, Domański Cz., Pruska K., Nieklasyczne metody statystyczne, PWE, Warszawa, Hall P., Marron J.S., Choice of Kernel Order in Density Estimation, Ann. Statist., 6, 987, Hjort N.L., Glad I.K., Nonparametric Density Estimation with a Parametric Start, Ann. Statis., 23, 3 995, Jarnicka J., Dwukrokowa metoda estymacji gęstości, preprint. 8. Marron J.S., Wand M.P., Exact Mean Integrated Squared Error, Ann. Statist., 20, 2 992, Park B.U., Marron J.S., Comparison of Data-Driven Bandwidth Selectors, J. Amer. Statist. Assoc., , Parzen E., On Estimation of a Probability Density Function and Mode, Ann. Math. Statist., , osenblatt M., emarks on some Nonparametrics Estimates of a Density Function, Ann. Math. Statist., , Sheater S.J., Jones M.C., A eliable Data-Based Bandwidth Selection Method for Kernel Density Estimation, J. oy. Statist. Soc. Ser. B, 53, 3 99, Silverman B.W., Density Estimation for Statistics and Data Analysis, Chapman and Hall, London, New York, Terrel G.., Scott D.W., Variable Kernel Density Estimation, Ann. Statist., 20, 3 992, Wand M.P., Marron J.S., uppert D., Transformations in Density Estimation, J. Amer. Statist. Assoc., 86, 44 99, Wu C.F.J., On the Convergence Properties of the EM Algorithm, Ann. Statist.,, 983, eceived November 2, jj@gamma.im.uj.edu.pl

A Novel Nonparametric Density Estimator

A Novel Nonparametric Density Estimator A Novel Nonparametric Density Estimator Z. I. Botev The University of Queensland Australia Abstract We present a novel nonparametric density estimator and a new data-driven bandwidth selection method with

More information

O Combining cross-validation and plug-in methods - for kernel density bandwidth selection O

O Combining cross-validation and plug-in methods - for kernel density bandwidth selection O O Combining cross-validation and plug-in methods - for kernel density selection O Carlos Tenreiro CMUC and DMUC, University of Coimbra PhD Program UC UP February 18, 2011 1 Overview The nonparametric problem

More information

J. Cwik and J. Koronacki. Institute of Computer Science, Polish Academy of Sciences. to appear in. Computational Statistics and Data Analysis

J. Cwik and J. Koronacki. Institute of Computer Science, Polish Academy of Sciences. to appear in. Computational Statistics and Data Analysis A Combined Adaptive-Mixtures/Plug-In Estimator of Multivariate Probability Densities 1 J. Cwik and J. Koronacki Institute of Computer Science, Polish Academy of Sciences Ordona 21, 01-237 Warsaw, Poland

More information

A NOTE ON THE CHOICE OF THE SMOOTHING PARAMETER IN THE KERNEL DENSITY ESTIMATE

A NOTE ON THE CHOICE OF THE SMOOTHING PARAMETER IN THE KERNEL DENSITY ESTIMATE BRAC University Journal, vol. V1, no. 1, 2009, pp. 59-68 A NOTE ON THE CHOICE OF THE SMOOTHING PARAMETER IN THE KERNEL DENSITY ESTIMATE Daniel F. Froelich Minnesota State University, Mankato, USA and Mezbahur

More information

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm by Korbinian Schwinger Overview Exponential Family Maximum Likelihood The EM Algorithm Gaussian Mixture Models Exponential

More information

Nonparametric Modal Regression

Nonparametric Modal Regression Nonparametric Modal Regression Summary In this article, we propose a new nonparametric modal regression model, which aims to estimate the mode of the conditional density of Y given predictors X. The nonparametric

More information

EM for ML Estimation

EM for ML Estimation Overview EM for ML Estimation An algorithm for Maximum Likelihood (ML) Estimation from incomplete data (Dempster, Laird, and Rubin, 1977) 1. Formulate complete data so that complete-data ML estimation

More information

Expectation Maximization (EM) Algorithm. Each has it s own probability of seeing H on any one flip. Let. p 1 = P ( H on Coin 1 )

Expectation Maximization (EM) Algorithm. Each has it s own probability of seeing H on any one flip. Let. p 1 = P ( H on Coin 1 ) Expectation Maximization (EM Algorithm Motivating Example: Have two coins: Coin 1 and Coin 2 Each has it s own probability of seeing H on any one flip. Let p 1 = P ( H on Coin 1 p 2 = P ( H on Coin 2 Select

More information

Nonparametric Econometrics

Nonparametric Econometrics Applied Microeconometrics with Stata Nonparametric Econometrics Spring Term 2011 1 / 37 Contents Introduction The histogram estimator The kernel density estimator Nonparametric regression estimators Semi-

More information

Likelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University

Likelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University Likelihood, MLE & EM for Gaussian Mixture Clustering Nick Duffield Texas A&M University Probability vs. Likelihood Probability: predict unknown outcomes based on known parameters: P(x q) Likelihood: estimate

More information

Nonparametric Methods

Nonparametric Methods Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis

More information

On variable bandwidth kernel density estimation

On variable bandwidth kernel density estimation JSM 04 - Section on Nonparametric Statistics On variable bandwidth kernel density estimation Janet Nakarmi Hailin Sang Abstract In this paper we study the ideal variable bandwidth kernel estimator introduced

More information

On prediction and density estimation Peter McCullagh University of Chicago December 2004

On prediction and density estimation Peter McCullagh University of Chicago December 2004 On prediction and density estimation Peter McCullagh University of Chicago December 2004 Summary Having observed the initial segment of a random sequence, subsequent values may be predicted by calculating

More information

The EM Algorithm for the Finite Mixture of Exponential Distribution Models

The EM Algorithm for the Finite Mixture of Exponential Distribution Models Int. J. Contemp. Math. Sciences, Vol. 9, 2014, no. 2, 57-64 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ijcms.2014.312133 The EM Algorithm for the Finite Mixture of Exponential Distribution

More information

arxiv: v1 [stat.me] 25 Mar 2019

arxiv: v1 [stat.me] 25 Mar 2019 -Divergence loss for the kernel density estimation with bias reduced Hamza Dhaker a,, El Hadji Deme b, and Youssou Ciss b a Département de mathématiques et statistique,université de Moncton, NB, Canada

More information

Adaptive Nonparametric Density Estimators

Adaptive Nonparametric Density Estimators Adaptive Nonparametric Density Estimators by Alan J. Izenman Introduction Theoretical results and practical application of histograms as density estimators usually assume a fixed-partition approach, where

More information

Density estimators for the convolution of discrete and continuous random variables

Density estimators for the convolution of discrete and continuous random variables Density estimators for the convolution of discrete and continuous random variables Ursula U Müller Texas A&M University Anton Schick Binghamton University Wolfgang Wefelmeyer Universität zu Köln Abstract

More information

Estimation of a quadratic regression functional using the sinc kernel

Estimation of a quadratic regression functional using the sinc kernel Estimation of a quadratic regression functional using the sinc kernel Nicolai Bissantz Hajo Holzmann Institute for Mathematical Stochastics, Georg-August-University Göttingen, Maschmühlenweg 8 10, D-37073

More information

Miscellanea Kernel density estimation and marginalization consistency

Miscellanea Kernel density estimation and marginalization consistency Biometrika (1991), 78, 2, pp. 421-5 Printed in Great Britain Miscellanea Kernel density estimation and marginalization consistency BY MIKE WEST Institute of Statistics and Decision Sciences, Duke University,

More information

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM c 2007-2016 by Armand M. Makowski 1 ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM 1 The basic setting Throughout, p, q and k are positive integers. The setup With

More information

PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS

PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS Statistica Sinica 15(2005), 831-840 PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS Florin Vaida University of California at San Diego Abstract: It is well known that the likelihood sequence of the EM algorithm

More information

1 Introduction A central problem in kernel density estimation is the data-driven selection of smoothing parameters. During the recent years, many dier

1 Introduction A central problem in kernel density estimation is the data-driven selection of smoothing parameters. During the recent years, many dier Bias corrected bootstrap bandwidth selection Birgit Grund and Jorg Polzehl y January 1996 School of Statistics, University of Minnesota Technical Report No. 611 Abstract Current bandwidth selectors for

More information

Statistical Estimation

Statistical Estimation Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Pointwise convergence rates and central limit theorems for kernel density estimators in linear processes

Pointwise convergence rates and central limit theorems for kernel density estimators in linear processes Pointwise convergence rates and central limit theorems for kernel density estimators in linear processes Anton Schick Binghamton University Wolfgang Wefelmeyer Universität zu Köln Abstract Convergence

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

Adaptive Kernel Estimation of The Hazard Rate Function

Adaptive Kernel Estimation of The Hazard Rate Function Adaptive Kernel Estimation of The Hazard Rate Function Raid Salha Department of Mathematics, Islamic University of Gaza, Palestine, e-mail: rbsalha@mail.iugaza.edu Abstract In this paper, we generalized

More information

Issues on Fitting Univariate and Multivariate Hyperbolic Distributions

Issues on Fitting Univariate and Multivariate Hyperbolic Distributions Issues on Fitting Univariate and Multivariate Hyperbolic Distributions Thomas Tran PhD student Department of Statistics University of Auckland thtam@stat.auckland.ac.nz Fitting Hyperbolic Distributions

More information

Parameter Estimation

Parameter Estimation Parameter Estimation Consider a sample of observations on a random variable Y. his generates random variables: (y 1, y 2,, y ). A random sample is a sample (y 1, y 2,, y ) where the random variables y

More information

Chapter 4. Theory of Tests. 4.1 Introduction

Chapter 4. Theory of Tests. 4.1 Introduction Chapter 4 Theory of Tests 4.1 Introduction Parametric model: (X, B X, P θ ), P θ P = {P θ θ Θ} where Θ = H 0 +H 1 X = K +A : K: critical region = rejection region / A: acceptance region A decision rule

More information

Akaike Information Criterion to Select the Parametric Detection Function for Kernel Estimator Using Line Transect Data

Akaike Information Criterion to Select the Parametric Detection Function for Kernel Estimator Using Line Transect Data Journal of Modern Applied Statistical Methods Volume 12 Issue 2 Article 21 11-1-2013 Akaike Information Criterion to Select the Parametric Detection Function for Kernel Estimator Using Line Transect Data

More information

Asymptotics for posterior hazards

Asymptotics for posterior hazards Asymptotics for posterior hazards Pierpaolo De Blasi University of Turin 10th August 2007, BNR Workshop, Isaac Newton Intitute, Cambridge, UK Joint work with Giovanni Peccati (Université Paris VI) and

More information

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes (bilmes@cs.berkeley.edu) International Computer Science Institute

More information

12 - Nonparametric Density Estimation

12 - Nonparametric Density Estimation ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6

More information

ISSN Asymptotic Confidence Bands for Density and Regression Functions in the Gaussian Case

ISSN Asymptotic Confidence Bands for Density and Regression Functions in the Gaussian Case Journal Afrika Statistika Journal Afrika Statistika Vol 5, N,, page 79 87 ISSN 85-35 Asymptotic Confidence Bs for Density egression Functions in the Gaussian Case Nahima Nemouchi Zaher Mohdeb Department

More information

Spring 2012 Math 541B Exam 1

Spring 2012 Math 541B Exam 1 Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote

More information

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X. Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may

More information

A PROBABILITY DENSITY FUNCTION ESTIMATION USING F-TRANSFORM

A PROBABILITY DENSITY FUNCTION ESTIMATION USING F-TRANSFORM K Y BERNETIKA VOLUM E 46 ( 2010), NUMBER 3, P AGES 447 458 A PROBABILITY DENSITY FUNCTION ESTIMATION USING F-TRANSFORM Michal Holčapek and Tomaš Tichý The aim of this paper is to propose a new approach

More information

Kernel Density Estimation

Kernel Density Estimation Kernel Density Estimation and Application in Discriminant Analysis Thomas Ledl Universität Wien Contents: Aspects of Application observations: 0 Which distribution? 0?? 0.0 0. 0. 0. 0.0 0. 0. 0 0 0.0

More information

Fall TMA4145 Linear Methods. Solutions to exercise set 9. 1 Let X be a Hilbert space and T a bounded linear operator on X.

Fall TMA4145 Linear Methods. Solutions to exercise set 9. 1 Let X be a Hilbert space and T a bounded linear operator on X. TMA445 Linear Methods Fall 26 Norwegian University of Science and Technology Department of Mathematical Sciences Solutions to exercise set 9 Let X be a Hilbert space and T a bounded linear operator on

More information

Boundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta

Boundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta Boundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta November 29, 2007 Outline Overview of Kernel Density

More information

New mixture models and algorithms in the mixtools package

New mixture models and algorithms in the mixtools package New mixture models and algorithms in the mixtools package Didier Chauveau MAPMO - UMR 6628 - Université d Orléans 1ères Rencontres BoRdeaux Juillet 2012 Outline 1 Mixture models and EM algorithm-ology

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.1 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

A tailor made nonparametric density estimate

A tailor made nonparametric density estimate A tailor made nonparametric density estimate Daniel Carando 1, Ricardo Fraiman 2 and Pablo Groisman 1 1 Universidad de Buenos Aires 2 Universidad de San Andrés School and Workshop on Probability Theory

More information

Nonparametric Density Estimation

Nonparametric Density Estimation Nonparametric Density Estimation Econ 690 Purdue University Justin L. Tobias (Purdue) Nonparametric Density Estimation 1 / 29 Density Estimation Suppose that you had some data, say on wages, and you wanted

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

Boosting kernel density estimates: A bias reduction technique?

Boosting kernel density estimates: A bias reduction technique? Biometrika (2004), 91, 1, pp. 226 233 2004 Biometrika Trust Printed in Great Britain Boosting kernel density estimates: A bias reduction technique? BY MARCO DI MARZIO Dipartimento di Metodi Quantitativi

More information

CORRECTION OF DENSITY ESTIMATORS WHICH ARE NOT DENSITIES

CORRECTION OF DENSITY ESTIMATORS WHICH ARE NOT DENSITIES CORRECTION OF DENSITY ESTIMATORS WHICH ARE NOT DENSITIES I.K. Glad\ N.L. Hjort 1 and N.G. Ushakov 2 * November 1999 1 Department of Mathematics, University of Oslo, Norway 2Russian Academy of Sciences,

More information

A Note on Data-Adaptive Bandwidth Selection for Sequential Kernel Smoothers

A Note on Data-Adaptive Bandwidth Selection for Sequential Kernel Smoothers 6th St.Petersburg Workshop on Simulation (2009) 1-3 A Note on Data-Adaptive Bandwidth Selection for Sequential Kernel Smoothers Ansgar Steland 1 Abstract Sequential kernel smoothers form a class of procedures

More information

Random Matrix Eigenvalue Problems in Probabilistic Structural Mechanics

Random Matrix Eigenvalue Problems in Probabilistic Structural Mechanics Random Matrix Eigenvalue Problems in Probabilistic Structural Mechanics S Adhikari Department of Aerospace Engineering, University of Bristol, Bristol, U.K. URL: http://www.aer.bris.ac.uk/contact/academic/adhikari/home.html

More information

EM for Spherical Gaussians

EM for Spherical Gaussians EM for Spherical Gaussians Karthekeyan Chandrasekaran Hassan Kingravi December 4, 2007 1 Introduction In this project, we examine two aspects of the behavior of the EM algorithm for mixtures of spherical

More information

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department

More information

Smooth simultaneous confidence bands for cumulative distribution functions

Smooth simultaneous confidence bands for cumulative distribution functions Journal of Nonparametric Statistics, 2013 Vol. 25, No. 2, 395 407, http://dx.doi.org/10.1080/10485252.2012.759219 Smooth simultaneous confidence bands for cumulative distribution functions Jiangyan Wang

More information

Analysis methods of heavy-tailed data

Analysis methods of heavy-tailed data Institute of Control Sciences Russian Academy of Sciences, Moscow, Russia February, 13-18, 2006, Bamberg, Germany June, 19-23, 2006, Brest, France May, 14-19, 2007, Trondheim, Norway PhD course Chapter

More information

Local Polynomial Regression

Local Polynomial Regression VI Local Polynomial Regression (1) Global polynomial regression We observe random pairs (X 1, Y 1 ),, (X n, Y n ) where (X 1, Y 1 ),, (X n, Y n ) iid (X, Y ). We want to estimate m(x) = E(Y X = x) based

More information

A new method of nonparametric density estimation

A new method of nonparametric density estimation A new method of nonparametric density estimation Andrey Pepelyshev Cardi December 7, 2011 1/32 A. Pepelyshev A new method of nonparametric density estimation Contents Introduction A new density estimate

More information

More on Estimation. Maximum Likelihood Estimation.

More on Estimation. Maximum Likelihood Estimation. More on Estimation. In the previous chapter we looked at the properties of estimators and the criteria we could use to choose between types of estimators. Here we examine more closely some very popular

More information

SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 1

SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 1 SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 1 B Technical details B.1 Variance of ˆf dec in the ersmooth case We derive the variance in the case where ϕ K (t) = ( 1 t 2) κ I( 1 t 1), i.e. we take K as

More information

EM Algorithm. Expectation-maximization (EM) algorithm.

EM Algorithm. Expectation-maximization (EM) algorithm. EM Algorithm Outline: Expectation-maximization (EM) algorithm. Examples. Reading: A.P. Dempster, N.M. Laird, and D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc.,

More information

Optimal Estimation of a Nonsmooth Functional

Optimal Estimation of a Nonsmooth Functional Optimal Estimation of a Nonsmooth Functional T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania http://stat.wharton.upenn.edu/ tcai Joint work with Mark Low 1 Question Suppose

More information

Variance Function Estimation in Multivariate Nonparametric Regression

Variance Function Estimation in Multivariate Nonparametric Regression Variance Function Estimation in Multivariate Nonparametric Regression T. Tony Cai 1, Michael Levine Lie Wang 1 Abstract Variance function estimation in multivariate nonparametric regression is considered

More information

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Statistica Sinica 19 (2009), 71-81 SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Song Xi Chen 1,2 and Chiu Min Wong 3 1 Iowa State University, 2 Peking University and

More information

Optimization Methods II. EM algorithms.

Optimization Methods II. EM algorithms. Aula 7. Optimization Methods II. 0 Optimization Methods II. EM algorithms. Anatoli Iambartsev IME-USP Aula 7. Optimization Methods II. 1 [RC] Missing-data models. Demarginalization. The term EM algorithms

More information

Chapter 3 : Likelihood function and inference

Chapter 3 : Likelihood function and inference Chapter 3 : Likelihood function and inference 4 Likelihood function and inference The likelihood Information and curvature Sufficiency and ancilarity Maximum likelihood estimation Non-regular models EM

More information

Quantitative Economics for the Evaluation of the European Policy. Dipartimento di Economia e Management

Quantitative Economics for the Evaluation of the European Policy. Dipartimento di Economia e Management Quantitative Economics for the Evaluation of the European Policy Dipartimento di Economia e Management Irene Brunetti 1 Davide Fiaschi 2 Angela Parenti 3 9 ottobre 2015 1 ireneb@ec.unipi.it. 2 davide.fiaschi@unipi.it.

More information

1 Solution to Problem 2.1

1 Solution to Problem 2.1 Solution to Problem 2. I incorrectly worked this exercise instead of 2.2, so I decided to include the solution anyway. a) We have X Y /3, which is a - function. It maps the interval, ) where X lives) onto

More information

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics Jiti Gao Department of Statistics School of Mathematics and Statistics The University of Western Australia Crawley

More information

A Note on the Expectation-Maximization (EM) Algorithm

A Note on the Expectation-Maximization (EM) Algorithm A Note on the Expectation-Maximization (EM) Algorithm ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign March 11, 2007 1 Introduction The Expectation-Maximization

More information

Estimation for nonparametric mixture models

Estimation for nonparametric mixture models Estimation for nonparametric mixture models David Hunter Penn State University Research supported by NSF Grant SES 0518772 Joint work with Didier Chauveau (University of Orléans, France), Tatiana Benaglia

More information

The Expectation Maximization or EM algorithm

The Expectation Maximization or EM algorithm The Expectation Maximization or EM algorithm Carl Edward Rasmussen November 15th, 2017 Carl Edward Rasmussen The EM algorithm November 15th, 2017 1 / 11 Contents notation, objective the lower bound functional,

More information

NONPARAMETRIC DENSITY ESTIMATION WITH RESPECT TO THE LINEX LOSS FUNCTION

NONPARAMETRIC DENSITY ESTIMATION WITH RESPECT TO THE LINEX LOSS FUNCTION NONPARAMETRIC DENSITY ESTIMATION WITH RESPECT TO THE LINEX LOSS FUNCTION R. HASHEMI, S. REZAEI AND L. AMIRI Department of Statistics, Faculty of Science, Razi University, 67149, Kermanshah, Iran. ABSTRACT

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

1 Introduction. 2 Measure theoretic definitions

1 Introduction. 2 Measure theoretic definitions 1 Introduction These notes aim to recall some basic definitions needed for dealing with random variables. Sections to 5 follow mostly the presentation given in chapter two of [1]. Measure theoretic definitions

More information

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Han Liu John Lafferty Larry Wasserman Statistics Department Computer Science Department Machine Learning Department Carnegie Mellon

More information

A PRACTICAL WAY FOR ESTIMATING TAIL DEPENDENCE FUNCTIONS

A PRACTICAL WAY FOR ESTIMATING TAIL DEPENDENCE FUNCTIONS Statistica Sinica 20 2010, 365-378 A PRACTICAL WAY FOR ESTIMATING TAIL DEPENDENCE FUNCTIONS Liang Peng Georgia Institute of Technology Abstract: Estimating tail dependence functions is important for applications

More information

DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA

DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA Statistica Sinica 18(2008), 515-534 DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA Kani Chen 1, Jianqing Fan 2 and Zhezhen Jin 3 1 Hong Kong University of Science and Technology,

More information

Teruko Takada Department of Economics, University of Illinois. Abstract

Teruko Takada Department of Economics, University of Illinois. Abstract Nonparametric density estimation: A comparative study Teruko Takada Department of Economics, University of Illinois Abstract Motivated by finance applications, the objective of this paper is to assess

More information

MIXTURE OF EXPERTS ARCHITECTURES FOR NEURAL NETWORKS AS A SPECIAL CASE OF CONDITIONAL EXPECTATION FORMULA

MIXTURE OF EXPERTS ARCHITECTURES FOR NEURAL NETWORKS AS A SPECIAL CASE OF CONDITIONAL EXPECTATION FORMULA MIXTURE OF EXPERTS ARCHITECTURES FOR NEURAL NETWORKS AS A SPECIAL CASE OF CONDITIONAL EXPECTATION FORMULA Jiří Grim Department of Pattern Recognition Institute of Information Theory and Automation Academy

More information

Random Eigenvalue Problems Revisited

Random Eigenvalue Problems Revisited Random Eigenvalue Problems Revisited S Adhikari Department of Aerospace Engineering, University of Bristol, Bristol, U.K. Email: S.Adhikari@bristol.ac.uk URL: http://www.aer.bris.ac.uk/contact/academic/adhikari/home.html

More information

Chapter 3. Point Estimation. 3.1 Introduction

Chapter 3. Point Estimation. 3.1 Introduction Chapter 3 Point Estimation Let (Ω, A, P θ ), P θ P = {P θ θ Θ}be probability space, X 1, X 2,..., X n : (Ω, A) (IR k, B k ) random variables (X, B X ) sample space γ : Θ IR k measurable function, i.e.

More information

Maximum Likelihood Estimation. only training data is available to design a classifier

Maximum Likelihood Estimation. only training data is available to design a classifier Introduction to Pattern Recognition [ Part 5 ] Mahdi Vasighi Introduction Bayesian Decision Theory shows that we could design an optimal classifier if we knew: P( i ) : priors p(x i ) : class-conditional

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information

Local Polynomial Wavelet Regression with Missing at Random

Local Polynomial Wavelet Regression with Missing at Random Applied Mathematical Sciences, Vol. 6, 2012, no. 57, 2805-2819 Local Polynomial Wavelet Regression with Missing at Random Alsaidi M. Altaher School of Mathematical Sciences Universiti Sains Malaysia 11800

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

A regeneration proof of the central limit theorem for uniformly ergodic Markov chains

A regeneration proof of the central limit theorem for uniformly ergodic Markov chains A regeneration proof of the central limit theorem for uniformly ergodic Markov chains By AJAY JASRA Department of Mathematics, Imperial College London, SW7 2AZ, London, UK and CHAO YANG Department of Mathematics,

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

OPTIMAL POINTWISE ADAPTIVE METHODS IN NONPARAMETRIC ESTIMATION 1

OPTIMAL POINTWISE ADAPTIVE METHODS IN NONPARAMETRIC ESTIMATION 1 The Annals of Statistics 1997, Vol. 25, No. 6, 2512 2546 OPTIMAL POINTWISE ADAPTIVE METHODS IN NONPARAMETRIC ESTIMATION 1 By O. V. Lepski and V. G. Spokoiny Humboldt University and Weierstrass Institute

More information

A Bayesian approach to parameter estimation for kernel density estimation via transformations

A Bayesian approach to parameter estimation for kernel density estimation via transformations A Bayesian approach to parameter estimation for kernel density estimation via transformations Qing Liu,, David Pitt 2, Xibin Zhang 3, Xueyuan Wu Centre for Actuarial Studies, Faculty of Business and Economics,

More information

arxiv: v2 [stat.me] 13 Sep 2007

arxiv: v2 [stat.me] 13 Sep 2007 Electronic Journal of Statistics Vol. 0 (0000) ISSN: 1935-7524 DOI: 10.1214/154957804100000000 Bandwidth Selection for Weighted Kernel Density Estimation arxiv:0709.1616v2 [stat.me] 13 Sep 2007 Bin Wang

More information

Assignment # 1. STA 355 (K. Knight) February 8, (a) by the Central Limit Theorem. n(g( Xn ) g(µ)) by the delta method ( ) (b) i.

Assignment # 1. STA 355 (K. Knight) February 8, (a) by the Central Limit Theorem. n(g( Xn ) g(µ)) by the delta method ( ) (b) i. STA 355 (K. Knight) February 8, 2014 Assignment # 1 Saša Milić 997 410 626 1. (a) n( n ) d N (0, σ 2 ()) by the Central Limit Theorem n(g( n ) g()) d g (θ)n (0, σ 2 ()) by the delta method σ 2 ()g (θ)

More information

Frontier estimation based on extreme risk measures

Frontier estimation based on extreme risk measures Frontier estimation based on extreme risk measures by Jonathan EL METHNI in collaboration with Ste phane GIRARD & Laurent GARDES CMStatistics 2016 University of Seville December 2016 1 Risk measures 2

More information

Brownian Motion. An Undergraduate Introduction to Financial Mathematics. J. Robert Buchanan. J. Robert Buchanan Brownian Motion

Brownian Motion. An Undergraduate Introduction to Financial Mathematics. J. Robert Buchanan. J. Robert Buchanan Brownian Motion Brownian Motion An Undergraduate Introduction to Financial Mathematics J. Robert Buchanan 2010 Background We have already seen that the limiting behavior of a discrete random walk yields a derivation of

More information

Positive data kernel density estimation via the logkde package for R

Positive data kernel density estimation via the logkde package for R Positive data kernel density estimation via the logkde package for R Andrew T. Jones 1, Hien D. Nguyen 2, and Geoffrey J. McLachlan 1 which is constructed from the sample { i } n i=1. Here, K (x) is a

More information

STA 294: Stochastic Processes & Bayesian Nonparametrics

STA 294: Stochastic Processes & Bayesian Nonparametrics MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a

More information

Computational Statistics. Jian Pei School of Computing Science Simon Fraser University

Computational Statistics. Jian Pei School of Computing Science Simon Fraser University Computational Statistics Jian Pei School of Computing Science Simon Fraser University jpei@cs.sfu.ca BASIC OPTIMIZATION METHODS J. Pei: Computational Statistics 2 Why Optimization? In statistical inference,

More information

Kullback-Leibler Designs

Kullback-Leibler Designs Kullback-Leibler Designs Astrid JOURDAN Jessica FRANCO Contents Contents Introduction Kullback-Leibler divergence Estimation by a Monte-Carlo method Design comparison Conclusion 2 Introduction Computer

More information

Stochastic Proximal Gradient Algorithm

Stochastic Proximal Gradient Algorithm Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind

More information

Nonparametric Function Estimation with Infinite-Order Kernels

Nonparametric Function Estimation with Infinite-Order Kernels Nonparametric Function Estimation with Infinite-Order Kernels Arthur Berg Department of Statistics, University of Florida March 15, 2008 Kernel Density Estimation (IID Case) Let X 1,..., X n iid density

More information