ON SOME TWO-STEP DENSITY ESTIMATION METHOD
|
|
- Madison Flowers
- 5 years ago
- Views:
Transcription
1 UNIVESITATIS IAGELLONICAE ACTA MATHEMATICA, FASCICULUS XLIII 2005 ON SOME TWO-STEP DENSITY ESTIMATION METHOD by Jolanta Jarnicka Abstract. We introduce a new two-step kernel density estimation method, based on the EM algorithm and the generalized kernel density estimator. The accuracy obtained is better in particular, in the case of multimodal or skewed densities.. Introduction. Density estimation has been investigated in many papers. In particular, the kernel estimation method, introduced in [], has received much attention. Let us recall some problems connected with the kernel density estimation. Let X i : Ω, i =,..., n, be a sequence of random variables defined on a probability space Ω, F, P. Suppose that they are identically and independently distributed with an unknown density f, and that {x i } is a sequence of corresponding observations. The kernel density estimator is characterized by two components: the bandwidth hn and the kernel K. Definition.. Let {hn} n= 0, + with hn 0. Let K : be a measurable nonnegative function. The kernel density estimator is given by the formula. fh x = n x xi K. nhn hn We call h the bandwidth window width or smoothing parameter. We call K the kernel. Statistical properties of this estimator, like biasedness, asymptotical unbiasedness and efficiency, were presented in [], [0], and [3]. i=
2 68 The kernel determines the regularity, symmetry about zero and shape of the estimator, while the bandwidth the amount of smoothing. In particular, f h is a density, provided that K 0 and Kt dt =. Considerable research has been carried out on the question of how one should select K in order to optimize properties of the kernel density estimator f h see e.g. [5], [4], and [3] and numerous examples have been discussed. Suggested choice is a density, symmetric about zero, with finite variance, like e.g. the Gaussian kernel Kt = 2π e 2 x2, the Epanechnikov kernel Kt = x2 for x < 5 or the rectangular kernel Kt = 2 for x < see e.g. [3] or [4] for more examples. But in some situations nonsymmetric kernels e.g. the exponential kernel Kt = e x for x > 0 guarantee better results. A comparative study of this problem was carried out in [7]. The main problem of the kernel density estimation is still the choice of the bandwidth hn. Various measures of accuracy of the estimator f h have been used to obtain the optimal bandwidth see e.g. [3], [3]. We will here focus on the mean integrated squared error given by MISE f h = E f h x fx 2 dx, and its asymptotic version for hn 0 and nhn as n, often abbreviated by AMISE f h see [5], [8]. The mean integrated squared error can be written as a sum of integrated squared bias and integrated variance of f h : MISE f h = E f h x fx 2 dx + V f h x dx. We want to keep both the bias and the variance possibly small, so the optimal bandwidth hn should be chosen so that MISE is minimal. Under additional assumptions on the density f we assume that f is twice differentiable and that 2 f x dx < and for a kernel being a symmetric density with a finite variance, we can provide asymptotic approximations for the bias and the variance of f h see [3]. Having obtained the formula for the asymptotic mean integrated squared error, we get the optimal bandwidth hn in the following form:.2 h opt n = K2 t dt 2 2 t2 Kt dt f x dx 5 n 5.
3 69 Although the integrals K2 t dt and t2 Kt dt can be calculated if K is known, formula.2 depends on an unknown density f and hence, in practice, the ability to calculate h opt n depends on obtaining an estimate for 2 f x dx. Several ways to solve this problem have appeared in the literature. One of the simplest methods, suggested in [3] often called the Silverman s rule of thumb, is to assume that f is known. For example assuming that f is the density of a normal distribution with parameters µ and σ 2, and using the Gaussian kernel, we obtain h opt n =.06σn 5, where σ can be estimated by the standard deviation. Other best known methods beside those giving automatic procedures for selecting the bandwidth, like the cross-validation see e.g. [3] are based on the idea of constructing a pilot function and then using it in actual estimation see e.g. [3], [9], [], [2], [5], and [6]. The problem is that none of these methods guarantees good results for a wide class of estimated density functions. In this paper we propose a new two-step kernel density estimation method, which generalizes methods considered in [3] and [6]. Its practical application is in fact based on a solution of some optimization problem. In Section 2 we present the idea of the two-step method and prove some basic properties of the proposed generalized kernel density estimator. Section 3 is dedicated to the problem of the choice of optimal bandwidths {} m j=. In Section 4 we propose an algorithm which enables us to construct a pilot. The last section contains a discussion of an example, which presents possibilities of our method. We emphasize that we have carried a lot of experiments and computer simulations in order to check the efficiency of the new method in comparison with the methods mentioned. According to the statistical analysis, we can state that in some cases e.g. when estimating bimodal or skewed densities our method gives better results than other known methods and in numerous cases its efficiency is comparable to the one of the methods mentioned see [7]. 2. The idea of the two-step method. Let Ω, F, P be a probability space, X i : Ω i =, 2,..., n a sequence of random variables and suppose they are independently and identically distributed. We assume that f is an unknown density function of X i and {x i } n i= is the sequence of observations on X i. Consider a family of functions {φ j x} m j=, such that 0 φ j x, φ j x =, x, j=
4 70 and a sequence of corresponding bandwidths {} m j=, such that h jn > 0 and 0, j =,..., m. Assume also that the kernel K : is a nonnegative measurable function. Definition 2.. We define the generalized kernel density estimator as a function given by 2. f φ x = n n i= j= φ j x i x K xi. The proposed method consists of two steps: We choose a pilot function f 0, by parametric estimation, using the EM algorithm see Section 4., under the assumption that f 0 is given as follows f 0 x = j α j f j x, where f j x = 2 e x µj 2 σ j, j =,..., m. 2πσj 2 Considering the generalized kernel density estimator nonparametric estimation we choose a sequence of bandwidths > 0, 0, for n, j =,..., m, using the pilot calculated in step, by taking 2.2 φ j x = α jf j x f 0 x = α j f j x m k= α kf k x. emark 2.2. The first modification with respect to the traditional kernel estimation method is the fact that we choose a sequence of bandwidths for j =,..., m instead of the single value hn. This may cause some complication in calculations but guarantees a better accuracy of the estimator. emark 2.3. Formula 2. corresponds to the methods described in [], [3] and in [5]. The idea of the construction of the pilot function by parametric estimation comes from [6]. emark 2.4. Note that for m =, taking φ x, we get the kernel density estimator., with the bandwidth h n. emark 2.5. As in the case of the kernel density estimator., the kernel K has an essential effect on the properties of f φ. In particular, if K is a density, and so Kx 0 and Kx dx =, then also f φ x 0 and
5 7 f φ x dx = = n n n n i= j= i= j= φ j y φ j y x y K dx x y K dx. Putting x y = t, for a fixed j, gives f φ x dx = Kt dt =. 2.. Properties of the generalized kernel density estimator. We will now present some statistical properties of f φ and then use its asymptotic behaviour in a criterion for the choice of, j =,..., m. Then Under the above assumptions and notation, the following lemma is true. Lemma 2.6. Assume that E φ j x i K x xi < and x xi x xi E φ j x i φ k x i K K <, j, k =,..., m. h k n E f φ x = V f φ x = n n j= n j,k= m j= Kyφ j x + yfx + y dy, K h k n x y K x y φ j yφ k yfy dy h k n Kyφ j x + yfx + y dy 2. Proof. Fix an n N. Since K is a measurable nonnegative function and {X i } n i= { is a sequence } of identically and independently distributed random variables, i n φj x K x xi forms a sequence of identically and independently i= distributed random variables. Thus E f φ x = n x n E xi φ j x i K = i= j= j= E φ j x i K x xi.
6 72 Observe that, for every j =,..., m E φ j x i K Letting y = x t, we have dy dt = x xi = K and x t φ j tft dt. Kyφ j x + yfx + y dy = Kyφ j x + yfx + y dy. Hence, summing both sides over j =,..., m, we get the expected value of the generalized kernel estimator E f φ x = j= Kyφ j x + yfx + y dy. To prove 2.4, we use the following equation There is V f φ x = n 2 n i= = n 2 n V f φ x = E f 2 φ x E f φ x 2. E [ m E i= [ m j= j= j= m E φ j x i x K xi 2 φ j x i x K xi j= ] 2 φ j x i x K xi m φ k x i x h k n K xi h k n k= E φ j x i K x xi ] 2.
7 73 Thanks to the independence of x i, i =,..., n, the first term can be written as: n m φ j x i x n 2 E h i= j= j n K xi m φ k x i x h k n K xi h k n k= m m = n = n = n j= j= k= j= k= φ j y K x y k= φ j yφ k y h k n K x y h k n K φ k y x y h k n K h k n K x y fy dy h k n fy dy x y x y K φ j yφ k yfy dy. h k n Hence the variance of the generalized kernel estimator is given by V f φ x = n n x y x y K K φ j yφ k yfy dy h k n h k n m 2, Kyφ j x + yfx + y dy n j,k= which completes the proof. j= emark 2.7. Taking m = and φ x in Theorem 2.6, we obtain the formulae for the expected value and variance of the traditional kernel estimator f h. They can be found for example in [3]. In order to show some statistical properties of fφ, we use the following lemma see [0] for a similar result for the traditional kernel estimator. Lemma 2.8. Under the above assumptions, let K be a bounded and integrable function such that lim t tkt = 0, and let > 0, 0, for n and j =,..., m. Then for every point x of continuity of f there is j= E φ j x i K x xi fx Kt dt, n. Proof. Our goal is to show that for every j =,..., m fx + zφ j x + z fxφ j x + zkz dz = 0. lim n
8 74 For a fixed j, there is fx + zφ j x + z fxφ j x + zkz dz fx + zφ j x + z fxφ j x + zkz dz. Letting ζ = z, we obtain where dz dζ fx ζφ j x ζ fxφ j x ζk = =, and ζ fx ζφ j x ζ fxφ j x ζk dζ fx ζφ j x ζ fxφ j x ζk ζ = fx ζ fx φ j x ζ K ζ dζ, ζ, for K is nonnegative. Observe that φ j x ζ, whenever x ζ. Therefore, fx ζ fx φ j x ζ K ζ dζ fx ζ fx K ζ dζ. Take an arbitrary δ > 0. Splitting the integration interval into two parts: {ζ : ζ δ} and {ζ : ζ < δ} and applying the triangle inequality to the first term, we get {ζ: ζ δ} fx ζ fx {ζ: ζ δ} K ζ fx ζ K + dζ {ζ: ζ δ} ζ dζ fx K ζ dζ.
9 75 Since K is bounded and f is a density, we get fx ζ ζk ζ dζ ζ {ζ: ζ δ} sup y δ and fxk ζ dζ = {ζ: ζ δ} fx = fx y δ yky fx ζ dζ {ζ: ζ δ} z δ Hence, for n, the first term tend to 0: fx ζ fx K ζ dζ {ζ: ζ δ} sup yky fx ζ dζ + fx {ζ: ζ δ} {ζ: ζ δ} Kz dz. z δ K ζ dζ Kz dz 0. From the continuity of f at x, the second integral may be estimated as follows fx ζ fx K ζ dζ {ζ: ζ <δ} ε δ K ζ dζ = ε δ Ky dy {ζ: ζ <δ} {y: y < δ } and converges to zero as n. Thus, for every j =,..., m, x E xi φ j x i K fxφ j x Kt dt, for n. After summing both sides over j the proof is completed. emark 2.9. By the additional assumption that Kt dt =, E f φ x fx 0, n, and hence the generalized kernel estimator is asymptotically unbiased. Provided that for every j =,..., m, n for n, we obtain V f φ 0 for n.
10 76 3. Criterion for the bandwidth choice. Like in the case of the traditional kernel estimator., we will use asymptotic mean integrated squared error AMISE f φ as a criterion for the choice of > 0, 0, for n, j =,..., m. We obtain the formula for AMISE f φ by asymptotic approximations of E f φ x fx and V f φ. We will use a well known corollary of Taylor s formula: Corollary 3.. Suppose that gx is an arbitrary function, n -times differentiable in a neighbourhood of x 0 and that g n x 0 at x 0 exists. Then ε>0 δ>0, 0<θ<δ : gx 0 + θ gx 0 θg x 0 θn n! gn x 0 < θn n! ε. Under the assumption from the last section we will formulate the criterion for the choice of the sequence of bandwidths. From now on, we assume that K is a measurable nonnegative function such that t 2 Kt dt < and Kt dt =. We will consider two cases: for a symmetric and nonsymmetric kernel, starting with the symmetric case. Theorem 3.2. Under the assumptions from the last section, suppose that functions φ j f, j =,..., m are differentiable in a neighbourhood of x and there exist derivatives φ j xfx, j =,..., m at x, and a kernel K satisfies tkt dt = 0. Then E f φ x = fx + 2 V f φ x = n + o j,k= 2 φ j xfx j= h k n. n j h jn K t 2 Kt dt + o 2, x y x y K φ j yφ k yfy dy h k n Proof. Let ε > 0. By Corollary 3., for every j =,..., m, there exists a δ > 0 such that 0 < t < δ and φ j x + tfx + t φ j xfx tφ j xfx 2 h jn 2 t 2 φ j xfx < h jn 2 t 2 ε. 2! j
11 77 Multiplying inequalities h jn 2 t 2 ε < φ j x + tfx + t φ j xfx tφ j xfx 2! 2 h jn 2 t 2 φ j xfx < h jn 2 t 2 ε 2! by Kt, and integrating with respect to t, we obtain h jn 2 ε 2! t 2 Kt dt < φ j x + tfx + tkt dt φ j xfx 2 h jn 2 φ j xfx t 2 Kt dt < h jn 2 ε t 2 Kt dt, 2! provided that K is a symmetric density. Summing over j =,..., m, by 2.3, we get 4 j= Therefore, 2 ε t 2 Kt dt < E ˆf φ x fx 2 E f φ x fx 2 2 φ j xfx j= 2 φ j xfx j= < 4 < 4 t 2 Kt dt 2 ε j= t 2 Kt dt j= t 2 Kt dt. 2 ε t 2 Kt dt. Since > 0, j =,..., m, denoting µ 2 = t2 Kt dt, we obtain E f φ x fx m j= h jn 2 2 φ jxfx µ 2 < 4 εµ 2. Observe that k h k n 2 m j= h jn 2 and 0, n for j =,..., m. Since ε > 0 is arbitrary, we get E f φ x fx 2 φ m jxfx µ 2 = o 2. j=
12 78 Now, since V f φ x = n 2 n i= m E j= φ j x i x K xi m φ k x i x h k n K xi h k n k= [ m j= E φ j x i K x xi ] 2, applying Lemma 2.8 and emark 2.9, we conclude that the second term is. By 2.4, the first term equals o n j h jn n j,k= K h k n which completes the proof. x y K x y φ j yφ k yfy dy, h k n In the nonsymmetric case, we proceed in the following way. Theorem 3.3. Under the assumptions from the last section, if the functions fφ j are differentiable at x, for j =,..., m and tkt dt 0, and tkt dt <, then E f φ x = fx + V f φ x = n +o j,k= φ j xfx j= h k n. n j h jn K x y tkt dt + o, K j x y φ j yφ k yfy dy h k n Proof. Let ε > 0 be arbitrary. By the Peano formula, for every j =,..., m, there exists a δ > 0, 0 < t < δ such that: φ j x + tfx + t φ j xfx tφ j xfx < tε. Multiplying by Kt, and integrating with respect to t, we get by assumptions on K, φ j x + tfx + tkt dt φ j xfx φ j xfx µ where µ = tkt dt. < εµ,
13 79 Summing over j =,..., m, by 2.3, we obtain E f φ x fx φ j xfx m µ < εµ. j= j= Therefore, E f φ x fx m j= h jn φ j xfx µ < εµ, since for every j, 0, n and hence j h jn is bounded, so E f φ x fx = µ φ j xfx = o. j= In the second part of the proof we can proceed as in the symmetric case see the proof of Theorem 3.2. Under the above assumptions, the following corollaries are true. Corollary 3.4. By Theorem 3.2, if K is symmetric and φ j xfx φ k xfx dx <, for every j, k =,..., m, then the asymptotic mean integrated squared error is given by: 3. AMISE f φ = + n j= k= h k n where µ 2 = t2 Kt dt. j= k= 4 h2 jnh 2 k nµ2 2 φ j xfx φ k xfx dx K t t K dt φ j yφ k yfy dy, h k n Corollary 3.5. By Theorem 3.3, if K is nonsymmetric and φ j xfx φ k xfx dx <, j
14 80 for every j, k =,..., m, then the asymptotic mean integrated squared error is given by 3.2 AMISE f φ = h k nµ 2 φ j xfx φ k xfx dx + n j= k= h k n where µ = tkt dt. j= k= K t t K dt φ j yφ k yfy dy, h k n Unfortunately the complexity of formulae 3. and 3.2 prevent the derivation of the explicit formulae for, j =,..., m, hence the choice of the sequence of bandwidths should be done numerically, by minimization of AMISE, for a symmetric and nonsymmetric kernel, respectively. Obviously the above formulae get simpler if we know the kernel. For example, in the case of the Gaussian kernel and for fixed j, k, we get µ 2 = t2 K g t dt = and Therefore, AMISE f φ = t K g K g j= k= t h k n dt = 2π 4 h2 jnh 2 k n e t = h k n 2π hj n 2 + h k n πn hj n 2 + h k n 2 h k n 2 dt φ j xfx φ k xfx dx φ j xφ k xfx dx. In case of the rectangular kernel, for fixed j, k, µ 2 = t2 K p t dt = 3 and t t K p K p dt = h k n 2 min {h jn, h k n}, and hence AMISE f φ = j= k= 36 h2 jnh 2 k n + min {h jn, h k n} 2nh k n φ j xfx φ k xfx dx φ j xφ k xfx dx.
15 8 Using the nonsymmetric exponential kernel, we obtain AMISE f φ = h k n φ j xfx φ k xfx dx j= k= + h k n n + h k n since for fixed j, k, there is µ = tk wt dt = t t + K w K w dt = e t h k n φ j xφ k xfx dx, h k n 2 dt = h jnh k n + h k n. emark 3.6. If K is symmetric, taking m = and φ x see emark 2.4, by 3., we obtain: AMISE f φ = h n 4 2 t 2 Kt dt φ xfx 2 dx 4 + K 2 t dt φ x 2 fx dx nh n = h n 4 2 t 2 Kt dt fx 2 dx + K 2 t dt, 4 nh n which gives the formula for AMISE for the traditional kernel estimator.. emark 3.7. Using nonsymmetric kernel, for m =, and φ x, by 3.2, we get the formula for AMISE for estimator.: 2 AMISE f φ = h n 2 tkt dt φ xfx 2 dx + K 2 t dt φ x 2 fx dx nh n 2 = h n 2 tkt dt fx 2 dx + K 2 t dt nh n see [7] for results on the nonsymmetric case in the traditional kernel estimation. In Section 4 we will construct the pilot function by parametric estimation. The main problem is the choice of number of components in the model α f x θ + α 2 f 2 x θ α m f m x θ m. This is a known and much investigated problem in many methods of parametric estimation. However, in our case, it is not so important, since the pilot is just
16 82 a tool for further estimation. So we suggest the choice of m = 2, because it does not decrease the efficiency of the method considered. emark 3.8. In the case m = 2, we choose two bandwidths h n and h 2 n, minimizing: for the Gaussian kernel: 3.3 AMISE f φ = 2 h2 nh 2 2n φ xfx φ 2 xfx dx + 4 h4 n φ xfx 2 dx + 4 h4 2n φ 2 xfx 2 dx + 2nh n φ 2 π xfx dx + 2nh 2 n φ 2 π 2xfx dx 2 + n πh 2 n + h2 2 n φ xφ 2 xfx dx, for the rectangular kernel: AMISE f φ = 8 h2 nh 2 2n φ xfx φ 2 xfx dx + 36 h4 n φ xfx 2 dx + 36 h4 2n φ 2 xfx 2 dx + φ 2 2nh n xfx dx + φ 2 2nh 2 n 2xfx dx + min {h n, h 2 n} φ xφ 2 xfx dx, nh nh 2 n for the exponential kernel: AMISE f φ = h 2 n φ xfx 2 dx + h 2 2n φ 2 xfx 2 dx + h n φ 2 2n xfx dx + h 2n φ 2 2n 2xfx dx + 2h nh 2 n φ xfx φ 2 xfx dx h nh 2 n + 2 φ xφ 2 xfx dx. nh n + h 2 n 4. Construction of the pilot function parametric estimation. In this section we present some way of construction of the pilot function, based on a modification of the well-known maximum likelihood method. We apply the Expectation Maximization Algorithm, introduced in [2] and used to estimate
17 83 the parameters in stochastic models, to hidden Markov models, to estimate a hazard function, as well as to estimate the parameters in mixture models. The last of these applications is the matter of our interest and will be used in the normal mixture model α Nµ, σ 2 + α 2 Nµ 2, σ α m Nµ m, σ 2 m. 4.. The EM Algorithm. Let x = x, x 2,..., x n be a sample of independent observations from an absolutely continuous distribution with a density function f X x θ, where θ denotes the parameters of the distribution. Definition 4.. We define the likelihood function and the log-likelihood function L x θ x := n f X x i θ, i= l x θ x := ln L x θ x = n ln f X x i θ. i= The maximum likelihood estimator of θ is given by θ = argmax θ Θ L x θ x = argmax l x θ x. θ Θ To present the EM algorithm we consider data model 4. {x i, y i, i =,..., n}, from the distribution of the random variable X, Y, where x = x,..., x n is a sample of independent observations from the absolutely continuous distribution of X, with the density f X treated as a marginal density of X, Y, and y = y,..., y n denotes the latent or missing data. Sample x, together with y is called the complete data. Definition 4.2. For data model 4., the likelihood and log-likelihood functions are defined as follows: n Lθ x, y = fx i, y i θ, i= lθ x, y = ln Lθ x, y = n ln fx i, y i θ. i=
18 84 From now on we will use the following notation n 4.4 f X x θ = f X x i θ, fx, y θ = f Y X y x, θ = i= n fx i, y i θ, i= n f Y X y i x i, θ. i= Corollary 4.3. For the considered data model, the maximum likelihood estimator of θ is given by θ = argmax l x θ x, y ln f Y X y x, θ, θ Θ where f Y X y x, θ denotes the conditional density of Y, given X = x and θ. Proof. By the definition of conditional density, and hence, by Definition 4., θ = argmax θ Θ = argmax θ Θ f X x θ = fx, y θ f Y X y x, θ, l x θ x = argmaxln fx, y θ ln f Y X y x, θ θ Θ lθ x, y ln f Y X y x, θ Now, let Qθ, θ t := E ln fx, y θ x, θ t, Hθ, θ t := E ln f Y X y x, θ x, θ t, where E x, θ denotes the conditional expected value, given X = x and fixed θ = θ t, where θ t is the value of parameter θ in the iteration t. The EM algorithm gives an iterative procedure that maximizes Qθ, θ t, for every t = 0,, 2,... giving θ t+ = argmax θ Qθ, θ t. After sufficient number of iterations, argmax θ l x θ x will be found with a high probability see [2], [6]. Theorem 4.4. Under the above assumptions the procedure of θ t+ = argmax Qθ, θ t, θ Θ
19 85 guarantees that with equality iff for t = 0,, 2,... Proof. Consider the expression l x θ t+ x l x θ t x, Qθ t+ x, θ t = Qθ t x, θ t, Qθ t+ x, θ t Hθ t+ x, θ t. From formulae 4.7 and 4.8, there follows Qθ t+ x, θ t Hθ t+ x, θ t = E ln fx, y θ t+ x, θ t E ln f Y X y x, θ t+ x, θ t = E ln fx, y θ t+ x, θ t, f Y X y x, θ t+ and by the definition of conditional density and 4.4 Thus l x θ t+ x = ln fx, y θ t+ f Y X y x, θ t+. Qθ t+ x, θ t Hθ t+ x, θ t = El x θ t+ x x, θ t. By the monotonicity of the conditional expected value, l x θ t+ x l x θ t x 0, provided that Qθ t+ x, θ t Qθ t x, θ t + Hθ t x, θ t Hθ t+ x, θ t 0. Note that the first term Qθ t+ x, θ t Qθ t x, θ t 0, by 4.4. The second one, by 4.8 and the definition of the conditional expected value, can be written as Hθ t x, θ t Hθ t+ x, θ t = E ln f Y X y x, θ t x, θ t E ln f Y X y x, θ t+ x, θ t = E ln f Y Xy x, θ t x, θ t = ln f Y Xy x, θ t f Y X y x, θ t+ f Y X y x, θ t+ f Y Xy x, θ t dy. As ln f Y Xy x, θ t f Y X y x, θ t+ = ln + f Y Xy x, θ t+ f Y X y x, θ t f Y X y x, θ t and ln + t < t, for every t >, t 0, the proof is completed.
20 86 The EM algorithm can be stated as follows: Start: Set the initial value θ 0. Each step consists of two substeps. For every t = 0,, 2,... we proceed as follows: Expectation step: Let θ t be the current estimates of the unknown parameters θ. By 4.7, we calculate Qθ, θ t =E ln fx, y θ x, θ t = ln fx, y θf Y X y x, θ t dy = ln fx, y θ fx, y θ t f X x θ t dy. Maximization step: Since f X x θ t is independent of θ, we get θ t+ = argmax Qθ, θ t = argmax ln fx, y θfx, y θ t dy. θ θ By iterating EM steps, the algorithm converges to the parameters that should be the maximum likelihood parameters, but the convergence speed is rather low see e.g. [6]. There is some research to accelerate the convergence speed of the EM algorithm, but the procedure is usually not easy and difficult to carry out. In this paper we will focus on the traditional form of the EM algorithm and apply it to the estimation of a mixture of the normal densities Fitting the mixture of the normal densities. As previously, consider x = x, x 2,..., x n a sample of independent observations from a density f X x θ, where θ = θ,..., θ m denotes the parameters of the distribution and consider the following model: where Suppose that f X is given by the mixture density f X x θ = α j f j x θ j, f j x θ j = j= α j > 0, j =,..., m, α j =, j= 2 e x µ j 2 σ j, θ j = µ j, σj 2. 2πσj We call α j the prior probability of the j-th mixture component, for j =,..., m.
21 87 Suppose also, as in 4., that y = y, y 2,..., y n denotes latent data from the distribution of random variable Y = Y, Y 2,..., Y n, such that P Y i = j = α j, j =,..., m, i =,..., n. Calculate Qθ, θ t for the above model. The conditional density of Y, given X = x and θ t, is given by n 4.9 f Y X y x, θ t = f Y X y i x i, θ t, and the joint density of X and Y 4.0 fx, y θ = i= n fx i, y i θ yi, i= which by the definition of the conditional density gives n 4. fx, y θ = α yi f yi x i θ yi. i= From formulae 4.9, 4. and 4.7, we obtain n 4.2 Qθ, θ t =Eln fx, y θ x, θ t = Eln α yi f yi x i θ yi x, θ t = = j= i= j= i= i= n ln α yi f yi x i θ yi P Y i = j x, θ t n ln α j f j x i θ j P j x i, θ t. Thus, in each iteration t of the EM algorithm, the expectation step, calculates qij t = P j x i, θ t. From now on, we will denote by α j t, µt j and σt j 2 estimators of the parameters α j, µ j, σj 2 in iteration t. Lemma 4.5. Under the above assumptions, α j t = n 4.3 q t n ij, i= n µ t i= j = qt ij x i 4.4, n i= qt ij 4.5 σ t j 2 = n i= x i µ t j 2 qij t n. i= qt ij
22 88 Proof. To find α j, we use the Lagrange multiplier λ, with the constraint of m j= α j =. For j =,..., m, we calculate Qθ, θ t + λ α j m j= By formula 4.2, m n m ln α j f j x i θ j qij t + λ α j and hence j= i= j= α j = 0. n qij t = λα j, j =,..., m. i= α j = 0, j =,..., m, Summing both sides over j, we get λ = n, and thus 4.3. To prove 4.4 and 4.5, we maximize the function n Qθ, θ t = ln α j f j x i θ j qij. t j= i= By the assumptions of the model, 2 f j x θ j = e x µ j 2 σ j, θ j = µ j, σj 2. 2πσj We get the function given by n 4.6 Qµ j, σ j = j= i= lnα j 2 ln2π lnσ j 2 xi µ j 2 q t σ ij. By differentiating 4.6 with respect to µ and σ, for j =,..., m, we obtain Q n = x i µ j q µ ij, t j Q σ j = i= n + x i µ j 2 q σ ij. t j i= Solving the equations n i= x i µ j qij t = 0 and n i= we get 4.4 and 4.5. σ 3 j σj + x i µ j 2 σ 3 j q t ij = 0,
23 89 The scheme of the EM algorithm, for the mixture of normal densities can be stated as follows: Start: We set αj 0, µ0 j i σ0 j 2. For j =, 2,..., m, we iterate t =, 2,...: E-step: We calculate qij t, M-step: We calculate α j t = n q n ij, t i= n µ t i= j = qt ij x i, n i= qt ij σ t j 2 = n i= x i µ t j 2 qij t n. i= qt ij The above scheme with the exact formulae can easily be used to construct the pilot function. 5. Example. In this section we present an example of how to apply the two-step method in practice. Suppose that we are given a sample of 200 independent observations from an absolutely continuous distribution with an unknown density f. In this example we consider the mixture of two Weibull densities, with parameters: α = 0.25, a = 2.9, b = 3.5 and α 2 = 0.75, a 2 =.6, b 2 = 0.6, which is nonsymmetric and bimodal. At first we construct a pilot function as a mixture of m = 2 normal densities f 0 x = α f x µ, σ 2 + α 2 f 2 x µ 2, σ 2 2, where f j x µ, σ 2 = e x µ j 2 2 σ j 2, j =, 2, 2πσj using the EM algorithm. We set the initial values of the estimates of the parameters: α 0 = 0.5, α 2 0 = 0.5, µ 0 = 0.5, µ 0 2 = 0.5, σ 0 2 = 0.5, σ = 0.5, and after 30 iterations of the EM algorithm we get α = 0.35, µ = 2.698, σ 2 =.560, α 2 = 0.685, µ 2 = 0.489, σ 2 2 =
24 90 Therefore, the pilot is given by the formula f 0 x = α x µ 2 α 2π σ 2 e 2 σ x µ 2 2 2π σ2 2 e 2 σ 2 2. In the second step we use the generalized kernel density estimator 2. with the Gaussian kernel. In this case φ x = α x µ 2 2π σ e 2 σ 2 2 f 0 x, φ 2 x = α x µ π σ2 e 2 σ f 0 x We obtain the bandwidths h and h 2 minimizing the AMISE f φ given by formula 3.3. By numerical minimization, we get h = and h 2 = 0.0. Hence, we have the density estimator given by the formula f φ x = φ x i= h x xi K h + φ 2x x xi K, h 2 h 2 where x,..., x 200 denote the sample given, and φ x, φ 2 x, h, h 2 are calculated above. To show the results of the estimation, we present the graphs of the estimator f φ and the estimated density in the picture below x One can see that the estimator f φ solid line is very close to the estimated density f dotted line, which gives evidence to a high efficiency of the method proposed.
25 9 eferences. Abramson I.S., Arbitrariness of the Pilot Estimator in Adapative Kernel Methods, J. Multivariate Anal., 2 982, Dempster A.P., Laird N.M., ubin D.B., Maximum-likelihood from Incomplete Data via EM Algorithm, J. oy. Statist. Soc. Ser. B 977, Devroye L., Gjörfi L., Nonparametric Density Estimation: the L view, Willey, New York, Domański Cz., Pruska K., Nieklasyczne metody statystyczne, PWE, Warszawa, Hall P., Marron J.S., Choice of Kernel Order in Density Estimation, Ann. Statist., 6, 987, Hjort N.L., Glad I.K., Nonparametric Density Estimation with a Parametric Start, Ann. Statis., 23, 3 995, Jarnicka J., Dwukrokowa metoda estymacji gęstości, preprint. 8. Marron J.S., Wand M.P., Exact Mean Integrated Squared Error, Ann. Statist., 20, 2 992, Park B.U., Marron J.S., Comparison of Data-Driven Bandwidth Selectors, J. Amer. Statist. Assoc., , Parzen E., On Estimation of a Probability Density Function and Mode, Ann. Math. Statist., , osenblatt M., emarks on some Nonparametrics Estimates of a Density Function, Ann. Math. Statist., , Sheater S.J., Jones M.C., A eliable Data-Based Bandwidth Selection Method for Kernel Density Estimation, J. oy. Statist. Soc. Ser. B, 53, 3 99, Silverman B.W., Density Estimation for Statistics and Data Analysis, Chapman and Hall, London, New York, Terrel G.., Scott D.W., Variable Kernel Density Estimation, Ann. Statist., 20, 3 992, Wand M.P., Marron J.S., uppert D., Transformations in Density Estimation, J. Amer. Statist. Assoc., 86, 44 99, Wu C.F.J., On the Convergence Properties of the EM Algorithm, Ann. Statist.,, 983, eceived November 2, jj@gamma.im.uj.edu.pl
A Novel Nonparametric Density Estimator
A Novel Nonparametric Density Estimator Z. I. Botev The University of Queensland Australia Abstract We present a novel nonparametric density estimator and a new data-driven bandwidth selection method with
More informationO Combining cross-validation and plug-in methods - for kernel density bandwidth selection O
O Combining cross-validation and plug-in methods - for kernel density selection O Carlos Tenreiro CMUC and DMUC, University of Coimbra PhD Program UC UP February 18, 2011 1 Overview The nonparametric problem
More informationJ. Cwik and J. Koronacki. Institute of Computer Science, Polish Academy of Sciences. to appear in. Computational Statistics and Data Analysis
A Combined Adaptive-Mixtures/Plug-In Estimator of Multivariate Probability Densities 1 J. Cwik and J. Koronacki Institute of Computer Science, Polish Academy of Sciences Ordona 21, 01-237 Warsaw, Poland
More informationA NOTE ON THE CHOICE OF THE SMOOTHING PARAMETER IN THE KERNEL DENSITY ESTIMATE
BRAC University Journal, vol. V1, no. 1, 2009, pp. 59-68 A NOTE ON THE CHOICE OF THE SMOOTHING PARAMETER IN THE KERNEL DENSITY ESTIMATE Daniel F. Froelich Minnesota State University, Mankato, USA and Mezbahur
More informationExponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger
Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm by Korbinian Schwinger Overview Exponential Family Maximum Likelihood The EM Algorithm Gaussian Mixture Models Exponential
More informationNonparametric Modal Regression
Nonparametric Modal Regression Summary In this article, we propose a new nonparametric modal regression model, which aims to estimate the mode of the conditional density of Y given predictors X. The nonparametric
More informationEM for ML Estimation
Overview EM for ML Estimation An algorithm for Maximum Likelihood (ML) Estimation from incomplete data (Dempster, Laird, and Rubin, 1977) 1. Formulate complete data so that complete-data ML estimation
More informationExpectation Maximization (EM) Algorithm. Each has it s own probability of seeing H on any one flip. Let. p 1 = P ( H on Coin 1 )
Expectation Maximization (EM Algorithm Motivating Example: Have two coins: Coin 1 and Coin 2 Each has it s own probability of seeing H on any one flip. Let p 1 = P ( H on Coin 1 p 2 = P ( H on Coin 2 Select
More informationNonparametric Econometrics
Applied Microeconometrics with Stata Nonparametric Econometrics Spring Term 2011 1 / 37 Contents Introduction The histogram estimator The kernel density estimator Nonparametric regression estimators Semi-
More informationLikelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University
Likelihood, MLE & EM for Gaussian Mixture Clustering Nick Duffield Texas A&M University Probability vs. Likelihood Probability: predict unknown outcomes based on known parameters: P(x q) Likelihood: estimate
More informationNonparametric Methods
Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis
More informationOn variable bandwidth kernel density estimation
JSM 04 - Section on Nonparametric Statistics On variable bandwidth kernel density estimation Janet Nakarmi Hailin Sang Abstract In this paper we study the ideal variable bandwidth kernel estimator introduced
More informationOn prediction and density estimation Peter McCullagh University of Chicago December 2004
On prediction and density estimation Peter McCullagh University of Chicago December 2004 Summary Having observed the initial segment of a random sequence, subsequent values may be predicted by calculating
More informationThe EM Algorithm for the Finite Mixture of Exponential Distribution Models
Int. J. Contemp. Math. Sciences, Vol. 9, 2014, no. 2, 57-64 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ijcms.2014.312133 The EM Algorithm for the Finite Mixture of Exponential Distribution
More informationarxiv: v1 [stat.me] 25 Mar 2019
-Divergence loss for the kernel density estimation with bias reduced Hamza Dhaker a,, El Hadji Deme b, and Youssou Ciss b a Département de mathématiques et statistique,université de Moncton, NB, Canada
More informationAdaptive Nonparametric Density Estimators
Adaptive Nonparametric Density Estimators by Alan J. Izenman Introduction Theoretical results and practical application of histograms as density estimators usually assume a fixed-partition approach, where
More informationDensity estimators for the convolution of discrete and continuous random variables
Density estimators for the convolution of discrete and continuous random variables Ursula U Müller Texas A&M University Anton Schick Binghamton University Wolfgang Wefelmeyer Universität zu Köln Abstract
More informationEstimation of a quadratic regression functional using the sinc kernel
Estimation of a quadratic regression functional using the sinc kernel Nicolai Bissantz Hajo Holzmann Institute for Mathematical Stochastics, Georg-August-University Göttingen, Maschmühlenweg 8 10, D-37073
More informationMiscellanea Kernel density estimation and marginalization consistency
Biometrika (1991), 78, 2, pp. 421-5 Printed in Great Britain Miscellanea Kernel density estimation and marginalization consistency BY MIKE WEST Institute of Statistics and Decision Sciences, Duke University,
More informationENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM
c 2007-2016 by Armand M. Makowski 1 ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM 1 The basic setting Throughout, p, q and k are positive integers. The setup With
More informationPARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS
Statistica Sinica 15(2005), 831-840 PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS Florin Vaida University of California at San Diego Abstract: It is well known that the likelihood sequence of the EM algorithm
More information1 Introduction A central problem in kernel density estimation is the data-driven selection of smoothing parameters. During the recent years, many dier
Bias corrected bootstrap bandwidth selection Birgit Grund and Jorg Polzehl y January 1996 School of Statistics, University of Minnesota Technical Report No. 611 Abstract Current bandwidth selectors for
More informationStatistical Estimation
Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationPointwise convergence rates and central limit theorems for kernel density estimators in linear processes
Pointwise convergence rates and central limit theorems for kernel density estimators in linear processes Anton Schick Binghamton University Wolfgang Wefelmeyer Universität zu Köln Abstract Convergence
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationAdaptive Kernel Estimation of The Hazard Rate Function
Adaptive Kernel Estimation of The Hazard Rate Function Raid Salha Department of Mathematics, Islamic University of Gaza, Palestine, e-mail: rbsalha@mail.iugaza.edu Abstract In this paper, we generalized
More informationIssues on Fitting Univariate and Multivariate Hyperbolic Distributions
Issues on Fitting Univariate and Multivariate Hyperbolic Distributions Thomas Tran PhD student Department of Statistics University of Auckland thtam@stat.auckland.ac.nz Fitting Hyperbolic Distributions
More informationParameter Estimation
Parameter Estimation Consider a sample of observations on a random variable Y. his generates random variables: (y 1, y 2,, y ). A random sample is a sample (y 1, y 2,, y ) where the random variables y
More informationChapter 4. Theory of Tests. 4.1 Introduction
Chapter 4 Theory of Tests 4.1 Introduction Parametric model: (X, B X, P θ ), P θ P = {P θ θ Θ} where Θ = H 0 +H 1 X = K +A : K: critical region = rejection region / A: acceptance region A decision rule
More informationAkaike Information Criterion to Select the Parametric Detection Function for Kernel Estimator Using Line Transect Data
Journal of Modern Applied Statistical Methods Volume 12 Issue 2 Article 21 11-1-2013 Akaike Information Criterion to Select the Parametric Detection Function for Kernel Estimator Using Line Transect Data
More informationAsymptotics for posterior hazards
Asymptotics for posterior hazards Pierpaolo De Blasi University of Turin 10th August 2007, BNR Workshop, Isaac Newton Intitute, Cambridge, UK Joint work with Giovanni Peccati (Université Paris VI) and
More informationA Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes (bilmes@cs.berkeley.edu) International Computer Science Institute
More information12 - Nonparametric Density Estimation
ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6
More informationISSN Asymptotic Confidence Bands for Density and Regression Functions in the Gaussian Case
Journal Afrika Statistika Journal Afrika Statistika Vol 5, N,, page 79 87 ISSN 85-35 Asymptotic Confidence Bs for Density egression Functions in the Gaussian Case Nahima Nemouchi Zaher Mohdeb Department
More informationSpring 2012 Math 541B Exam 1
Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote
More informationOptimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.
Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may
More informationA PROBABILITY DENSITY FUNCTION ESTIMATION USING F-TRANSFORM
K Y BERNETIKA VOLUM E 46 ( 2010), NUMBER 3, P AGES 447 458 A PROBABILITY DENSITY FUNCTION ESTIMATION USING F-TRANSFORM Michal Holčapek and Tomaš Tichý The aim of this paper is to propose a new approach
More informationKernel Density Estimation
Kernel Density Estimation and Application in Discriminant Analysis Thomas Ledl Universität Wien Contents: Aspects of Application observations: 0 Which distribution? 0?? 0.0 0. 0. 0. 0.0 0. 0. 0 0 0.0
More informationFall TMA4145 Linear Methods. Solutions to exercise set 9. 1 Let X be a Hilbert space and T a bounded linear operator on X.
TMA445 Linear Methods Fall 26 Norwegian University of Science and Technology Department of Mathematical Sciences Solutions to exercise set 9 Let X be a Hilbert space and T a bounded linear operator on
More informationBoundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta
Boundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta November 29, 2007 Outline Overview of Kernel Density
More informationNew mixture models and algorithms in the mixtools package
New mixture models and algorithms in the mixtools package Didier Chauveau MAPMO - UMR 6628 - Université d Orléans 1ères Rencontres BoRdeaux Juillet 2012 Outline 1 Mixture models and EM algorithm-ology
More informationChapter 9. Non-Parametric Density Function Estimation
9-1 Density Estimation Version 1.1 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least
More informationChapter 9. Non-Parametric Density Function Estimation
9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least
More informationA tailor made nonparametric density estimate
A tailor made nonparametric density estimate Daniel Carando 1, Ricardo Fraiman 2 and Pablo Groisman 1 1 Universidad de Buenos Aires 2 Universidad de San Andrés School and Workshop on Probability Theory
More informationNonparametric Density Estimation
Nonparametric Density Estimation Econ 690 Purdue University Justin L. Tobias (Purdue) Nonparametric Density Estimation 1 / 29 Density Estimation Suppose that you had some data, say on wages, and you wanted
More informationStatistics 3858 : Maximum Likelihood Estimators
Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,
More informationBoosting kernel density estimates: A bias reduction technique?
Biometrika (2004), 91, 1, pp. 226 233 2004 Biometrika Trust Printed in Great Britain Boosting kernel density estimates: A bias reduction technique? BY MARCO DI MARZIO Dipartimento di Metodi Quantitativi
More informationCORRECTION OF DENSITY ESTIMATORS WHICH ARE NOT DENSITIES
CORRECTION OF DENSITY ESTIMATORS WHICH ARE NOT DENSITIES I.K. Glad\ N.L. Hjort 1 and N.G. Ushakov 2 * November 1999 1 Department of Mathematics, University of Oslo, Norway 2Russian Academy of Sciences,
More informationA Note on Data-Adaptive Bandwidth Selection for Sequential Kernel Smoothers
6th St.Petersburg Workshop on Simulation (2009) 1-3 A Note on Data-Adaptive Bandwidth Selection for Sequential Kernel Smoothers Ansgar Steland 1 Abstract Sequential kernel smoothers form a class of procedures
More informationRandom Matrix Eigenvalue Problems in Probabilistic Structural Mechanics
Random Matrix Eigenvalue Problems in Probabilistic Structural Mechanics S Adhikari Department of Aerospace Engineering, University of Bristol, Bristol, U.K. URL: http://www.aer.bris.ac.uk/contact/academic/adhikari/home.html
More informationEM for Spherical Gaussians
EM for Spherical Gaussians Karthekeyan Chandrasekaran Hassan Kingravi December 4, 2007 1 Introduction In this project, we examine two aspects of the behavior of the EM algorithm for mixtures of spherical
More informationAsymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands
Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department
More informationSmooth simultaneous confidence bands for cumulative distribution functions
Journal of Nonparametric Statistics, 2013 Vol. 25, No. 2, 395 407, http://dx.doi.org/10.1080/10485252.2012.759219 Smooth simultaneous confidence bands for cumulative distribution functions Jiangyan Wang
More informationAnalysis methods of heavy-tailed data
Institute of Control Sciences Russian Academy of Sciences, Moscow, Russia February, 13-18, 2006, Bamberg, Germany June, 19-23, 2006, Brest, France May, 14-19, 2007, Trondheim, Norway PhD course Chapter
More informationLocal Polynomial Regression
VI Local Polynomial Regression (1) Global polynomial regression We observe random pairs (X 1, Y 1 ),, (X n, Y n ) where (X 1, Y 1 ),, (X n, Y n ) iid (X, Y ). We want to estimate m(x) = E(Y X = x) based
More informationA new method of nonparametric density estimation
A new method of nonparametric density estimation Andrey Pepelyshev Cardi December 7, 2011 1/32 A. Pepelyshev A new method of nonparametric density estimation Contents Introduction A new density estimate
More informationMore on Estimation. Maximum Likelihood Estimation.
More on Estimation. In the previous chapter we looked at the properties of estimators and the criteria we could use to choose between types of estimators. Here we examine more closely some very popular
More informationSUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 1
SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 1 B Technical details B.1 Variance of ˆf dec in the ersmooth case We derive the variance in the case where ϕ K (t) = ( 1 t 2) κ I( 1 t 1), i.e. we take K as
More informationEM Algorithm. Expectation-maximization (EM) algorithm.
EM Algorithm Outline: Expectation-maximization (EM) algorithm. Examples. Reading: A.P. Dempster, N.M. Laird, and D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc.,
More informationOptimal Estimation of a Nonsmooth Functional
Optimal Estimation of a Nonsmooth Functional T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania http://stat.wharton.upenn.edu/ tcai Joint work with Mark Low 1 Question Suppose
More informationVariance Function Estimation in Multivariate Nonparametric Regression
Variance Function Estimation in Multivariate Nonparametric Regression T. Tony Cai 1, Michael Levine Lie Wang 1 Abstract Variance function estimation in multivariate nonparametric regression is considered
More informationSMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES
Statistica Sinica 19 (2009), 71-81 SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Song Xi Chen 1,2 and Chiu Min Wong 3 1 Iowa State University, 2 Peking University and
More informationOptimization Methods II. EM algorithms.
Aula 7. Optimization Methods II. 0 Optimization Methods II. EM algorithms. Anatoli Iambartsev IME-USP Aula 7. Optimization Methods II. 1 [RC] Missing-data models. Demarginalization. The term EM algorithms
More informationChapter 3 : Likelihood function and inference
Chapter 3 : Likelihood function and inference 4 Likelihood function and inference The likelihood Information and curvature Sufficiency and ancilarity Maximum likelihood estimation Non-regular models EM
More informationQuantitative Economics for the Evaluation of the European Policy. Dipartimento di Economia e Management
Quantitative Economics for the Evaluation of the European Policy Dipartimento di Economia e Management Irene Brunetti 1 Davide Fiaschi 2 Angela Parenti 3 9 ottobre 2015 1 ireneb@ec.unipi.it. 2 davide.fiaschi@unipi.it.
More information1 Solution to Problem 2.1
Solution to Problem 2. I incorrectly worked this exercise instead of 2.2, so I decided to include the solution anyway. a) We have X Y /3, which is a - function. It maps the interval, ) where X lives) onto
More informationModel Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao
Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics Jiti Gao Department of Statistics School of Mathematics and Statistics The University of Western Australia Crawley
More informationA Note on the Expectation-Maximization (EM) Algorithm
A Note on the Expectation-Maximization (EM) Algorithm ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign March 11, 2007 1 Introduction The Expectation-Maximization
More informationEstimation for nonparametric mixture models
Estimation for nonparametric mixture models David Hunter Penn State University Research supported by NSF Grant SES 0518772 Joint work with Didier Chauveau (University of Orléans, France), Tatiana Benaglia
More informationThe Expectation Maximization or EM algorithm
The Expectation Maximization or EM algorithm Carl Edward Rasmussen November 15th, 2017 Carl Edward Rasmussen The EM algorithm November 15th, 2017 1 / 11 Contents notation, objective the lower bound functional,
More informationNONPARAMETRIC DENSITY ESTIMATION WITH RESPECT TO THE LINEX LOSS FUNCTION
NONPARAMETRIC DENSITY ESTIMATION WITH RESPECT TO THE LINEX LOSS FUNCTION R. HASHEMI, S. REZAEI AND L. AMIRI Department of Statistics, Faculty of Science, Razi University, 67149, Kermanshah, Iran. ABSTRACT
More informationLecture 4: Probabilistic Learning
DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods
More information1 Introduction. 2 Measure theoretic definitions
1 Introduction These notes aim to recall some basic definitions needed for dealing with random variables. Sections to 5 follow mostly the presentation given in chapter two of [1]. Measure theoretic definitions
More informationSparse Nonparametric Density Estimation in High Dimensions Using the Rodeo
Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Han Liu John Lafferty Larry Wasserman Statistics Department Computer Science Department Machine Learning Department Carnegie Mellon
More informationA PRACTICAL WAY FOR ESTIMATING TAIL DEPENDENCE FUNCTIONS
Statistica Sinica 20 2010, 365-378 A PRACTICAL WAY FOR ESTIMATING TAIL DEPENDENCE FUNCTIONS Liang Peng Georgia Institute of Technology Abstract: Estimating tail dependence functions is important for applications
More informationDESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA
Statistica Sinica 18(2008), 515-534 DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA Kani Chen 1, Jianqing Fan 2 and Zhezhen Jin 3 1 Hong Kong University of Science and Technology,
More informationTeruko Takada Department of Economics, University of Illinois. Abstract
Nonparametric density estimation: A comparative study Teruko Takada Department of Economics, University of Illinois Abstract Motivated by finance applications, the objective of this paper is to assess
More informationMIXTURE OF EXPERTS ARCHITECTURES FOR NEURAL NETWORKS AS A SPECIAL CASE OF CONDITIONAL EXPECTATION FORMULA
MIXTURE OF EXPERTS ARCHITECTURES FOR NEURAL NETWORKS AS A SPECIAL CASE OF CONDITIONAL EXPECTATION FORMULA Jiří Grim Department of Pattern Recognition Institute of Information Theory and Automation Academy
More informationRandom Eigenvalue Problems Revisited
Random Eigenvalue Problems Revisited S Adhikari Department of Aerospace Engineering, University of Bristol, Bristol, U.K. Email: S.Adhikari@bristol.ac.uk URL: http://www.aer.bris.ac.uk/contact/academic/adhikari/home.html
More informationChapter 3. Point Estimation. 3.1 Introduction
Chapter 3 Point Estimation Let (Ω, A, P θ ), P θ P = {P θ θ Θ}be probability space, X 1, X 2,..., X n : (Ω, A) (IR k, B k ) random variables (X, B X ) sample space γ : Θ IR k measurable function, i.e.
More informationMaximum Likelihood Estimation. only training data is available to design a classifier
Introduction to Pattern Recognition [ Part 5 ] Mahdi Vasighi Introduction Bayesian Decision Theory shows that we could design an optimal classifier if we knew: P( i ) : priors p(x i ) : class-conditional
More informationCurve Fitting Re-visited, Bishop1.2.5
Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the
More informationLocal Polynomial Wavelet Regression with Missing at Random
Applied Mathematical Sciences, Vol. 6, 2012, no. 57, 2805-2819 Local Polynomial Wavelet Regression with Missing at Random Alsaidi M. Altaher School of Mathematical Sciences Universiti Sains Malaysia 11800
More informationLatent Variable Models and EM algorithm
Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic
More informationA regeneration proof of the central limit theorem for uniformly ergodic Markov chains
A regeneration proof of the central limit theorem for uniformly ergodic Markov chains By AJAY JASRA Department of Mathematics, Imperial College London, SW7 2AZ, London, UK and CHAO YANG Department of Mathematics,
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationOPTIMAL POINTWISE ADAPTIVE METHODS IN NONPARAMETRIC ESTIMATION 1
The Annals of Statistics 1997, Vol. 25, No. 6, 2512 2546 OPTIMAL POINTWISE ADAPTIVE METHODS IN NONPARAMETRIC ESTIMATION 1 By O. V. Lepski and V. G. Spokoiny Humboldt University and Weierstrass Institute
More informationA Bayesian approach to parameter estimation for kernel density estimation via transformations
A Bayesian approach to parameter estimation for kernel density estimation via transformations Qing Liu,, David Pitt 2, Xibin Zhang 3, Xueyuan Wu Centre for Actuarial Studies, Faculty of Business and Economics,
More informationarxiv: v2 [stat.me] 13 Sep 2007
Electronic Journal of Statistics Vol. 0 (0000) ISSN: 1935-7524 DOI: 10.1214/154957804100000000 Bandwidth Selection for Weighted Kernel Density Estimation arxiv:0709.1616v2 [stat.me] 13 Sep 2007 Bin Wang
More informationAssignment # 1. STA 355 (K. Knight) February 8, (a) by the Central Limit Theorem. n(g( Xn ) g(µ)) by the delta method ( ) (b) i.
STA 355 (K. Knight) February 8, 2014 Assignment # 1 Saša Milić 997 410 626 1. (a) n( n ) d N (0, σ 2 ()) by the Central Limit Theorem n(g( n ) g()) d g (θ)n (0, σ 2 ()) by the delta method σ 2 ()g (θ)
More informationFrontier estimation based on extreme risk measures
Frontier estimation based on extreme risk measures by Jonathan EL METHNI in collaboration with Ste phane GIRARD & Laurent GARDES CMStatistics 2016 University of Seville December 2016 1 Risk measures 2
More informationBrownian Motion. An Undergraduate Introduction to Financial Mathematics. J. Robert Buchanan. J. Robert Buchanan Brownian Motion
Brownian Motion An Undergraduate Introduction to Financial Mathematics J. Robert Buchanan 2010 Background We have already seen that the limiting behavior of a discrete random walk yields a derivation of
More informationPositive data kernel density estimation via the logkde package for R
Positive data kernel density estimation via the logkde package for R Andrew T. Jones 1, Hien D. Nguyen 2, and Geoffrey J. McLachlan 1 which is constructed from the sample { i } n i=1. Here, K (x) is a
More informationSTA 294: Stochastic Processes & Bayesian Nonparametrics
MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a
More informationComputational Statistics. Jian Pei School of Computing Science Simon Fraser University
Computational Statistics Jian Pei School of Computing Science Simon Fraser University jpei@cs.sfu.ca BASIC OPTIMIZATION METHODS J. Pei: Computational Statistics 2 Why Optimization? In statistical inference,
More informationKullback-Leibler Designs
Kullback-Leibler Designs Astrid JOURDAN Jessica FRANCO Contents Contents Introduction Kullback-Leibler divergence Estimation by a Monte-Carlo method Design comparison Conclusion 2 Introduction Computer
More informationStochastic Proximal Gradient Algorithm
Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind
More informationNonparametric Function Estimation with Infinite-Order Kernels
Nonparametric Function Estimation with Infinite-Order Kernels Arthur Berg Department of Statistics, University of Florida March 15, 2008 Kernel Density Estimation (IID Case) Let X 1,..., X n iid density
More information