NONPARAMETRIC MIXTURE OF REGRESSION MODELS
|
|
- Baldric Morgan
- 5 years ago
- Views:
Transcription
1 NONPARAMETRIC MIXTURE OF REGRESSION MODELS Huang, M., & Li, R. The Pennsylvania State University Technical Report Series #09-93 College of Health and Human Development The Pennsylvania State University
2 1 NONPARAMETRIC MIXTURE OF REGRESSION MODELS Mian Huang Department of Statistics Shanghai University of Finance and Economics Shanghai, , P. R. China Runze Li Department of Statistics and The Methodology Center The Pennsylvania State University University Park, PA, , USA Abstract: Motivated by an analysis of US house price index data, we propose nonparametric finite mixture of regression models. We develop an estimation procedure of the newly proposed models using EM algorithm and kernel regression. The asymptotic property of the proposed estimate are systematically studied. The asymptotic bias and variance of the resulting estimate are derived. We further establish the asymptotic normality of the proposed estimate. Monte Carlo simulations are conducted to examine the finite sample performance of the proposed estimation procedures. An empirical analysis of the US house price index data is illustrated for the proposed methodology. Key words and phrases: EM algorithm, kernel regression, mixture of regression models, nonparametric regression. 1. Introduction Mixture models have been widely used in econometrics and social science, and the theories for mixture models have been well studied (Lindsay, 1995). As a class of useful models of the mixture models, finite mixture of linear regression models has been applied in various fields in the literature since its introduction in Goldfeld and Quandt (1976). For example, there are applications in econometrics and marketing (DeSarbo, 1993; Frühwirth-Schnatter, 2001; Rossi et al., 2005), in epidemiology (Green and Richardson, 2002), and in biology (Wang et al., 1996). Bayesian approaches for mixture regression models are summarized in Frühwirth-Schnatter (2005). Many efforts have been made to these models
3 2 MIAN HUANG AND RUNZE LI and their extensions such as finite mixture of generalized linear models, and comprehensively summarized in McLachlan and Peel (2000). Motivated by an analysis of US house price index data in section 3.4, we propose nonparametric finite mixtures of regression models. Compared with finite mixture of linear regression models, the newly proposed models relax the linearity assumption on the regression function, and allow that the regression function in each component is an unknown but smooth function of its covariates. In this paper, we consider the situation in which the mixing proportion, the mean functions and the variance functions are all nonparametric. Using a kernel regression technique, we develop an estimation procedure for the unknown functions in the nonparametric mixture of regression model via local likelihood approach. The sampling properties of the proposed estimation procedure are investigated. We derive the asymptotic bias and variance of the local likelihood estimate, and establish its asymptotic normality. For the estimation procedure, one may naively implement an EM algorithm (Dempster et al., 1977) for maximizing the local likelihood function. However, it is desired to estimate the whole curves by evaluating the resulting estimate over a set of grid points. The naive implementation of the EM algorithm does not ensure that the component labels will match correctly at different grid points. This is similar to the label switching problem in older applications of mixture modeling. We modify the EM algorithm to simultaneously maximize the local likelihood functions for the proposed nonparametric mixture of regression models at the set of grid points. The modified EM algorithm works well in our simulation and in a real data example. We further study the ascent property of the proposed EM algorithm. We derive a standard error formula for the resulting estimate by the conventional sandwich formula. A bandwidth selector is proposed for the local likelihood estimate using a multi-fold cross-validation method. A simulation study is conducted to examine the performance of the proposed procedure and test the accuracy of the proposed standard error formula. We further demonstrate the proposed model and estimation procedure by an empirical analysis of US house price index data. The rest of this paper is structured as follows. In section 2, we present
4 NONPARAMETRIC MIXTURE OF REGRESSIONS 3 the nonparametric finite mixture of regression models, and derive asymptotic property for the local likelihood estimate. We further develop an estimation procedure for the proposed model using EM algorithm and kernel regression. Simulation results and an empirical analysis of a real data are presented in section 3. Some discussions are provided in section 4. Technical conditions and proofs are given in section Estimation Procedure and its Sampling Properties 2.1. Model Definition Suppose that {(x i, y i ), i = 1,, n} are random samples from the population (X, Y ). Throughout this paper, we assume X is univariate. The proposed methodology and theoretical results can be extended to multivariate X, but the extension is less useful due to the curse of dimensionality. Let C be a latent class variable, and assume that conditioning on X = x, C has a discrete distribution P (C = c X = x) = π c (x) for c = 1, 2,, C. Conditioning on C = c and X = x, Y follows a normal distribution with mean m c (x) and variance σc 2 (x), where m c ( ) and σc 2 ( ) are unknown but smooth functions. In other words, conditioning on X = x, the response variable Y follows a finite mixture of normals C π c (x)n{m c (x), σc 2 (x)}. (2.1) c=1 In this paper, we assume that C is fixed, and refer to model (2.1) as a nonparametric finite mixture of regression models because π c ( ), m c ( ) and σc 2 ( ) are nonparametric. We will discuss how to determine C in section 3. When π c (x) and σc 2 (x) are constant, and m c (x) is linear in x, model (2.1) reduces to a finite mixture of linear regression models (Goldfeld and Quandt, 1976). When C = 1, model (2.1) is a nonparametric regression model. See Chapter 2 of Fan and Gijbels (1996) for a review of nonparametric regression model. Thus, model (2.1) can be regarded as a natural extension of nonparametric models and finite mixture of linear regression models. It is easy to adapt the conventional constraints imposed on the finite mixture of linear regression models for the proposed model so that it is identifiable, and the corresponding likelihood function is bounded. For further references on these issues, see Hening (2000) and Hathaway (1985). Since π c ( ), m c ( ) and σc 2 ( ) are nonparametric, we need nonparametric smooth-
5 4 MIAN HUANG AND RUNZE LI ing techniques for (2.1). In this paper, we will employ kernel regression for model (2.1). Suppose we want to estimate the unknown functions at x 0. In kernel regression, we first use local constants π c0, σ 2 c0, and m c0 to approximate π c (x 0 ), σ 2 c (x 0 ), and m c (x 0 ). function K( ) with a bandwidth h. Let K h ( ) = h 1 K( /h) be a rescaled kernel of a kernel Furthermore, denote φ(y µ, σ 2 ) to be the density function N(µ, σ 2 ). Then, the corresponding local log-likelihood function for data {(x i, y i ), i = 1, 2,, n} is { C } l n (π 0, σ 2 0, m 0 ) = log π c0 φ(y i m c0, σc0) 2 K h (x i x 0 ), (2.2) c=1 where m 0 = (m 10,, m C0 ) T, σ 2 0 = (σ2 10,, σ2 C0 )T, π 0 = (π 10,, π C 1,0 ) T, and π C0 = 1 C 1 c=1 π c0. One may also apply local linear regression or local polynomial regression techniques for estimation of π c (x), m c (x), and σ 2 c (x). Local linear regression has several nice statistical properties (Fan and Gijbels, 1996). However, in our model setting local linear regression does not yield a closed-form solution for variance functions and mixing proportion functions in the M-step of the proposed EM algorithm. See (2.7), (2.8), and (2.9) for more details. Thus, we use kernel regression to estimate the nonparametric functions. Let {ˆπ, ˆσ 2, ˆm} be the maximizer of the local likelihood function (2.2). Then the estimates for π c (x 0 ), σ 2 c (x 0 ), and m c (x 0 ) are ˆπ c (x 0 ) = ˆπ c0, ˆσ 2 c (x 0 ) = ˆσ 2 c0, and ˆm c (x 0 ) = ˆm c Asymptotic Properties We next study the asymptotic properties of ˆπ c (x 0 ), ˆσ 2 c (x 0 ) and ˆm c (x 0 ). Let θ = (π T, σ 2T, m T ) T, and denote η(y θ) = C π c φ { y m c, σc 2 }, and l(θ, y) = log η(y θ). c=1 Let θ(x 0 ) = {π T (x 0 ), σ 2 (x 0 ) T, m(x 0 ) T } T, and define η 1 {θ(x), y} = η{y θ(x)}, η 2 {θ(x), y} = 2 η{y θ(x)} θ θ θ T, l{θ(x), y} q 1 {θ(x), y} =, q 2 {θ(x), y} = 2 l{θ(x), y} θ θ θ T, I(x 0 ) = E[q 2 {θ(x), Y } X = x 0 ], and Λ(x) = q 1 {θ(x 0 ), y}.η{y θ(x)}dy Y
6 Denote γ n = (nh) 1/2, NONPARAMETRIC MIXTURE OF REGRESSIONS 5 ˆm c = nh{ ˆm c0 m c (x 0 )}, ˆσ 2 c = nh{ˆσ 2 c0 σ 2 c (x 0 )}, ˆπ c = nh{ˆπ c0 π c (x 0 )}, ˆπ C = nh{ˆπ C0 π C (x 0 )} = C 1 nh[1 {ˆπ c0 π c (x 0 )}]. Let ˆm = ( ˆm 1,, ˆm C )T, ˆσ 2 = (ˆσ 2 1, ˆσ2 C )T, and ˆπ = (ˆπ 1,, ˆπ C 1 )T, and ˆθ = {(ˆπ ) T, (ˆσ 2 ) T, ( ˆm ) T } T. The asymptotic bias, variance, and normality of the resulting estimate are given in the following theorem. The proof is given in section 5. Theorem 1 Suppose that conditions (A) (F) in section 5 hold. Then it follows that nh{γnˆθ B(x0 ) + o p (h 2 )} c=1 D N { 0, f 1 (x 0 )ν 0 I 1 (x 0 ) }, where f( ) is the marginal density function of X, { f B(x 0 ) = I 1 (x 0 )Λ (x 0 ) (x 0 ) + 1 } f(x 0 ) 2 Λ (x 0 ) κ 2 h 2, κ l = u l K(u) du, and ν l = u l K 2 (u) du An Effective EM Algorithm For a given x 0, one may maximize the local likelihood function (2.2) using an EM algorithm easily. However, in practice we typically want to evaluate the unknown functions at a set of grid points over an interval of x. This requires us to maximize the local likelihood function (2.2) at different grid points. This imposes some challenges because the labels in the EM algorithm may change at different grid points, which is similar to the label switching problem which occurs when one conducts a bootstrap to get a confidence interval for parameters in a mixture model. Thus, a naive implementation of the EM algorithm may fail to yield smooth estimated curves. An illustration is given in Figure 2.1, in which the solid curves are obtained when the labels at the grid points u j 1, u j and u j+1 match correctly, while the dotted curves are obtained when the labels at u j do not match with the ones in u j 1 and u j+1 correctly.
7 6 MIAN HUANG AND RUNZE LI No "label switching" "Label switching" y u(j 1) u(j) u(j+1) Figure 2.1: An illustration of mismatching labels of naive implementation of EM algorithm. Solid curves are obtained when the labels at u j 1, u j and u j+1 match correctly. Dotted curves are obtained when the labels at u j does not match the ones in u j 1 and u j+1 correctly. In this section, we propose an effective EM algorithm to deal with this issue. The key idea is that at the M-step of the EM algorithm, we update the estimated curves at all grid points for the given label in the E-step. Define Bernoulli random variables z ic = { 1, if (xi, y i ) is in the c th group, 0, otherwise. and let z i = (z i1,, z ic ) T. The complete data are {(x i, y i, z i ), i = 1, 2,, n}, and the complete log-likelihood function is c=1 C [ z ic log πc (x i ) + log φ{y i m c (x i ), σc 2 (x i )} ]. In the l-th step of the EM algorithm iteration, we have m (l) c ( ), σ 2(l) ( ), and c
8 NONPARAMETRIC MIXTURE OF REGRESSIONS 7 π (l) ( ). In the E-step, the expectation of the latent variable z ic is given by ic = π c (l) (x i )φ{y i m (l) c (x i ), σc 2(l) (x i )} C c=1 π(l) c (x i )φ{y i m (l) c (x i ), σc 2(l) (x i )}. (2.3) r (l) Let {u 1,, u N } be a set of grid points at which the unknown functions are evaluated, where N is the number of grid points. In the M-step, we maximize C c=1 r (l) [ ic log πc0 (x 0 ) + log φ{y i m c0 (x 0 ), σc0(x 2 0 )} ] K h (x i x 0 ), (2.4) for x 0 = u i, i = 1,, N. In practice, if n is not very large, one may directly set the observed {x 1,, x n } to be the grid points. In such case, N = n. and The maximization of (2.4) is equivalent to maximizing for c = 1,, C, r (l) ic log π c0(x 0 )K h (x i x 0 ), (2.5) r (l) ic log φ{y i m c0 (x 0 ), σ 2 c0(x 0 )}K h (x i x 0 ), (2.6) separately. The solution for (2.5) is, for x 0 {u j, j = 1,, N}, π (l+1) c0 (x 0 ) = n r(l) ic K h(x i x 0 ) n K. (2.7) h(x i x 0 ) The closed form solution for (2.6) is, for x 0 {u j, j = 1,, N}, where w (l) ic and σ 2(l+1) c σ 2(l+1) c0 m (l+1) c0 (x 0 ) = σ 2(l+1) c0 (x 0 ) = w (l) ic y i/ n w(l) n w(l) ic = r (l) ic K h(x i x 0 ). (x i ), i = 1,, n by linearly interpolating π c (l+1 w (l) ci, (2.8) ic {y i m (l+1) c0 (x 0 )} 2, (2.9) Furthermore, we update π c (l+1) (x i ), m (l+1) c (x i ) (u j ), m (l+1) (u j ) and (u j ), j = 1,, N, respectively. With initial values of π c ( ), m c ( ) and σ 2 ( ), the proposed estimation procedure is summarized in the following algorithm. An EM algorithm: c0
9 8 MIAN HUANG AND RUNZE LI Initial Value: Obtain an initial value for π c ( ), m c ( ) and σc 2 ( ) by conducting a mixture of polynomial regressions. Denote the initial value by π c (1) ( ), m (1) c ( ) and σc 2(1) ( ). Set l = 1. E-step: Use (2.3) to calculate r (l) ic for i = 1,, n, and c = 1,, C. M-step: For c = 1,, C and j = 1,, N, evaluate π c (l+1) (u j ) in (2.7), m (l+1) c (u j ) in (2.8) and σ 2(l+1) c0 (u j ) in (2.9). Further obtain π c (l+1) (x i ), m (l+1) c (x i ) and σc 2(l+1) (x i ) using a linear interpolation. Iteratively update the E-step and the M-step with l = 1, 2,, until the algorithm converges. It is well known that an ordinary EM algorithm for parametric models possesses an ascent property, which is a desired property. The modified EM algorithm can be regarded as an extension of the EM algorithm from parametric models to nonparametric models. Thus, it is of interest to study whether the modified EM algorithm still preserves the ascent property. Let θ (l) ( ) = {π (l) ( ), σ 2(l) ( ), m (l) ( )} be the l-th step estimated functions in the proposed EM algorithm. Denote θ 0 = (π T 0, σ2t 0, mt 0 )T. We rewrite the local log-likelihood function (2.2) as l n (θ 0 ) = l(θ 0, y i )K h (x i x 0 ). (2.10) Theorem 2 For any given point x 0 in the interval of x, suppose that θ (l) ( ) has continuous first derivative, and nh as n, h 0. It follows that [ ] lim inf n n 1 l n {θ (l+1) (x 0 )} l n {θ (l) (x 0 )} 0 (2.11) in probability. The proof of Theorem 2 is given in Section 5. Theorem 2 implies that when the sample size n is large enough, the proposed algorithm possess the ascent property at any given x 0. In particular, at the set of grid points {u 1,, u N } where the unknown curves are evaluated, we have for large n, l n {θ (l+1) (u j )} l n {θ (l) (u j )} 0.
10 NONPARAMETRIC MIXTURE OF REGRESSIONS 9 3. Simulation and Application In this section, we address some practical implementation issues such as standard error formula and bandwidth selection for nonparametric mixture of regression models. To assess the performance of the estimates of the unknown regression functions m c (x), we consider the square root of the average square errors (RASE) for mean functions, RASE 2 m = N 1 C c=1 j=1 N { ˆm c (u j ) m c (u j )} 2, where {u j, j = 1,, N} is the grid points at which the unknown function m c ( ) are evaluated. For simplification, the grid points are taken evenly on the range of the x-variable. Similarly, we can define RASE for variance functions σc 2 (x)s and proportion functions π c (x)s, denoted by RASE σ and RASE π, respectively Standard Error Formula Define the fitted value for the i-th observation as a weight sum of the estimated means, C ŷ i = r ic ˆm c (x i ), c=1 where r ic is given (2.3) when the effective EM algorithm converges. Then, the residual is e i = y i ŷ i. Rewrite the estimate of m c (x) in the proposed algorithm as ˆm c (x) = (E T W c E) 1 E T W c y, where E is a n 1 vector with all entries equal to 1, and W c = diag{w c1,, w cn } with w ic = r ic K h (x i x 0 ). We consider the following approximate standard error formula for ˆm c (x): Var{ ˆm c (x)} = (E T W c E) 1 E T W c Ĉov(y)W c E(E T W c E) 1, (3.1) where Ĉov(y) = diag{e2 1, e2 2,, e2 n}, a diagonal matrix consisting of the residuals squares e 2 i. Furthermore, (3.1) can be written as Var{ ˆm c (x)} = n w2 ic e2 i ( n w ic) 2. (3.2)
11 10 MIAN HUANG AND RUNZE LI 3.2. Bandwidth Selection Bandwidth selection is fundamental to nonparametric smoothing. In this section we propose a multifold cross-validation (CV) method to choose the bandwidth. Denote D to be the full data set. We then partition D into a training set R j and testing set T j, D = T j R j j = 1,, J. We use the training set R j to obtain the estimates { ˆm c ( ), ˆσ 2 c ( ), ˆπ c ( )}. Then we can estimate m c (x), σ 2 c (x) and π c (x) for the data points belong to the corresponding testing set. For (x l, y l ) T j, ˆm c (x l ) = {i:x i R j } r ick h (x i x l )y i {i:x i R j } r ick h (x i x l ), ˆσ 2 c (x l ) = ˆπ c (x l ) = {i:x i R j } r ick h (x i x l ){y i ˆm c (x i )} 2 {i:x i R j } r, ick h (x i x l ) {i:x i R j } r ick h (x i x l ) {i:x i R j } K h(x i x l ). Based on the estimated ˆm c (x l ) of test set T j, we again calculate the probability memberships in test set T j. For (x l, y l ) T j, c = 1,, C, r lc = ˆπ c (x l )φ{y l ˆm c (x l ), ˆσ 2 c (x l )} C q=1 ˆπ q(x l )φ{y l ˆm q (x l ), ˆσ 2 q(x l )}. Now we can implement regular CV criterion in the proposed model, J CV = (y l ŷ l ) 2, (3.3) j=1 l T j where ŷ l = C c=1 r lc ˆm c (x l ) is the predicted value of y l in the test set T j Simulation Study In the following example, we conduct a simulation for a 2-component nonparametric mixture of regressions model with π 1 (x) = exp(0.5x)/{1 + exp(0.5x)}, and π 2 (x) = 1 π 1 (x), m 1 (x) = 4 sin(2πx), and m 2 (x) = cos(3πx), σ 1 (x) = 0.6 exp(0.5x), and σ 2 (x) = 0.5 exp( 0.2x). For each of the samples, 500 simulations were conducted with sample sizes n = 200, 400, 800. The predictor x is generated from one dimension uniform distribution in [0, 1]. The Epanechnikov kernel is used in our simulation.
12 NONPARAMETRIC MIXTURE OF REGRESSIONS 11 Table 3.1: RASEs: Mean and Standard Deviation RASE m RASE σ 2 RASE π n h Mean(Std) Mean(Std) Mean(Std) (0.0964) (0.0918) (0.0393) (0.0682) (0.0706) (0.0325) (0.0992) (0.1431) (0.0237) (0.0723) (0.0622) ) (0.0659) (0.0517) (0.0276) (0.0683) (0.0552) (0.0190) (0.0562) (0.0465) (0.0312) (0.0440) (0.0476) (0.0218) (0.0368) (0.0395) (0.0149) Table 3.2: Standard error of the unknown mean functions m 1(x) m 2(x) n x SD SE(Std) SD SE(Std) (0.0371) (0.0315) (h=0.08) (0.0468) (0.0338) (0.0247) (0.0222) (h=0.06) (0.0290) (0.0256) (0.0190) (0.0164) (h=0.04) (0.0210) (0.0154) It is well known that the EM algorithm may be trapped by a local maximizer. Thus, the EM algorithm may be sensitive to the initial value. To obtain a good initial value, we first fit a mixture of polynomial regression models, and obtain the estimates of mean functions m c (x), and parameters σ c 2, π c. Then we set the initial values m (1) c (x) = m c (x), σ 2(1) (x) = σ c 2, and π (1) (x) = π c. To demonstrate that the proposed procedure works quite well for a wide range of bandwidths, we use the following strategy to determine the bandwidth rather than use the crossvalidation bandwidth selector for each generated sample in our simulation study to save computing time. In our simulation, we first generate several simulation data sets for a given sample size, and then use the CV bandwidth selectors to choose a bandwidth for each data set. This provides us an idea about the c
13 12 MIAN HUANG AND RUNZE LI 8 7 true mean functions 6 Estimated mean functions, true mean functions and confident intervals estimated mean functions true mean functions 95% conficent intervals y 3 y x (a) x (b) Figure 3.2: (a) A typical sample of simulated data (n=400), and the plot of true mean functions; (b) The estimated mean functions (solid lines), true mean functions (dash lines), and 95% pointwise confidence intervals (dotted lines) with n = 400, h = optimal bandwidth for a given sample size. Then we consider three different bandwidths: half of the selected bandwidth, the selected bandwidth, and twice of the selected bandwidth, which corresponds to the under-smoothing, appropriate smoothing and over-smoothing, respectively. Table 3.1 displays the mean and standard deviation of RASEs over 500 simulations. From Table 3.1, the proposed procedure performs quite well for a wide range of bandwidths, We next test the accuracy of the standard error formulas. Table 3.2 summarizes the simulation results for the unknown functions m c (x) at points 0.30 and The standard deviation of 500 estimates, denoted by SD, can be viewed as the true standard errors. We then calculate the sample average and standard deviation of the 500 estimated standard errors using the proposed standard error formulas (3.1), denoted by SE(Std). The result shows that although underestimation is presented, the proposed sandwich formula works reasonably well because the difference between the true value and the estimate is less than twice of the standard error of the estimate. Now we illustrate the performance of the proposed procedure by using a typical simulated sample with n = 400, which is selected to be the one with the median of RASE m in the 500 simulation samples. Figure 3.2(a) shows the
14 NONPARAMETRIC MIXTURE OF REGRESSIONS 13 scatter plot of the typical sample data and the true mean functions. Before we conduct analysis using nonparametric mixture of regression models for this data set, we first briefly discuss how to determine the number of components C. Note that conditioning on X = x, the nonparametric mixture of regression models (2.1) is indeed an univariate mixture of normals. For a local area of x, m c (x) may be consider as a constant, thus to choose C for nonparametric mixture of regression models, we suggest an analysis based on modeling the response valuable y of the local data in univariate mixture of normals. An illustration is given in Figure 3.2(a). We select the local area with relative large variance in y direction, which is shown in the rectangle. Traditional methods can be used to determine C. We fit the local data using univariate mixture of normals with one, two, three, and four components. The BIC for each model is 83.29, 64.19, 68.43, and Thus, a two component model is selected. We use the crossvalidation (CV) criterion proposed in Section 3.2 to select a bandwidth. The CV bandwidth selector yields the bandwidth The resulting estimate with n = 400 along with its pointwise confidence interval is depicted in Figure 3.2(b), from which we can see that the true mean functions lies within the confidence interval. This implies that the proposed estimation procedure performs quite well with the moderate sample size An Application We illustrate the proposed model and estimation procedure by an analysis of a real dataset, which contains the monthly change of SP-Case Shiller House Price Index (HPI) and monthly growth rate of United States Gross Domestic Product (GDP) from Jan, 1990 to Dec, It is known that in the literature of economic research, HPI is a measure of a nation s housing cost and GDP is a measure of the size of nation s economy. It is of interested to investigate the impact of GDP growth rate on HPI change. As expected, the impact of GDP growth rate on HPI may have different pattern in different macroeconomic cycles. In this illustration, we set HPI change to be the response variable, and the GDP growth rate to be predictor. The scatterplot of this data set is depicted in Figure 3.3(a). We first determine the number of component C using local data within the rectangle of Figure 3.3(a). The BIC for univariate mixture of normals models
15 14 MIAN HUANG AND RUNZE LI with one, two and three components are 13.46, 2.34 and 2.57, respectively. Thus, a two component model is selected. We next select the bandwidth by a 5-fold CV selector described in (3.3). The selected optimal bandwidth is With this optimal bandwidth we fit the data in a 2-component nonparametric mixture of regression models. The estimated mean functions are shown in Figure 3.3(b). We further depicted the hard-clustering result, which is obtained by assigning component identities according to the largest r ic, c = 1,, C. The triangular points from the lower cluster are mainly from January 1990 to September The circle points in the upper cluster are mainly from October 1997 to December 2002, during which the economy experienced an internet boom and the following bust. From the result we observed that in the upper component, HPI change tend to be lower when GDP growth is modest. The estimated mixing proportion function is plotted in Figure 3.3(c), and the estimated variance functions are plotted in Figure 3.3(d). We compare the proposed nonparametric finite mixture of regression models to two other models: a local linear regression model and 2-component mixture of linear regression models. The optimal bandwidth for local linear regression was selected by a plug-in method (Ruppert et al., 1995), and the mixture of regression models will be estimated by an EM algorithm. We randomly select 5% of the data to be a test set and the rest as a training set. Estimations for 3 different models are obtained from training set and prediction errors are calculated based on the deviations in test set. Our result shows that in 100 simulations, the average prediction error of nonparametric mixture of regression models, mixture of linear regression models, and local linear regression are , , and , respectively. The result implies that the proposed model has 7.26% reduction in prediction error compared to mixture of linear regression models, and has 57.43% reduction in prediction error compared to local linear regression. The improved prediction error demonstrates the advantages of nonparametric finite mixture of regression model in the analysis of US house price index data. 4. Discussion In this paper, we proposed a class of nonparametric finite mixture of regression models, which allows the mixing proportions, the mean functions and the variance functions all are nonparametric functions of the covariate. We proposed
16 NONPARAMETRIC MIXTURE OF REGRESSIONS 15 Scatter Plot Estimation and hard clustering HPI Change 0.4 HPI Change GDP Growth GDP Growth Rate (a) (b) Estimated Proportion Function Proportion function of upper component 0.03 Estimated Variance Functions upper component lower component Proportion Function Variance Functions GDP Growth Rate GDP Growth Rate (c) (d) Figure 3.3: (a) Scatterplot of US house price index data; (b)estimated mean functions with 95% confident interval and hard-clustering result; (c)estimated mixing proportion function; (d)estimated variance functions.
17 16 MIAN HUANG AND RUNZE LI an estimation procedure for the newly proposed model by kernel regression, and further studied the sampling properties of the resulting estimate. We derived its asymptotic bias and asymptotic variance, and further established its asymptotic normality. We proposed an effective EM algorithm to calculate the local maximum likelihood estimate. We also studied the ascent property of the proposed algorithm. We founded that for large sample, the effective EM algorithm still possesses the monotone ascent property. The performance of the proposed procedures was empirically examined by a Monte Carlo simulation and illustrated by an analysis of a real data example. In this paper, we focus on estimation of the newly proposed nonparametric finite mixture of regression models when x 0 is an interior point in the range of covariate. It is certainly of interest to study the boundary effect of the proposed procedure. The boundary effect has been studied in Cheng, Fan and Marron (1997) for the nonparametric regression model. It is of interest to conduct testing hypothesis whether the mixing proportions are constants, whether some of mean functions are constant or of a specific parametric forms and whether the variance functions are parametric. These can be for further researches. 5. Proofs In this section we outline the key steps for proofs of Theorems 1 and 2. Note that θ = (π T, σ 2T, m T ) T is a (3C 1) 1 vector. Whenever necessary, we rewrite θ = (θ 1,, θ 3C 1 ) T without changing the order of π, σ 2, and m. Otherwise, we will use the same notations as we defined in section 2. Regularity Conditions A. The sample {(x i, y i ), i = 1,, n} is independent and identically distributed from its population (x, y). The support for x, denoted by X, is closed and bounded of R 1. The density η{y θ(x 0 )} is identifiable up to a permutation of the mixture components. B. The marginal density function f(x) is continuous first derivative and positive for x X. C. There exists a function M(x, y), with E{M(X, Y )} <, such that 3 l(θ, y)/ θ j θ k θ l M(x, y), for all x, y, and all θ in a neighborhood of θ(x 0 ). D. The unknown functions θ(x) has continuous second derivative.
18 NONPARAMETRIC MIXTURE OF REGRESSIONS 17 E. The kernel function K( ) has a bounded support, and satisfies that K(t)dt = 1, tk(t)dt = 0, t 2 K(t)dt = κ 2 > 0. F. The following conditions hold for all i and j. ( l(θ(x 0 ), Y ) 3) [ { E 2 } 2 ] l(θ(x 0 ), Y ) <, E <. θ j θ i θ j All these conditions are mild conditions and have been used in the literature of local likelihood estimation and mixture models. used in the proof of Theorem 1. The following lemma will be Lemma 1 Under Conditions A, C, and F, for x 0 in the interior of X, it holds that E[q 1 {θ(x), Y } X = x 0 ] = 0, (5.1) E[q 2 {θ(x), Y } X = x 0 ] = E[q 1 {θ(x), Y }q T 1 {θ(x), Y } X = x 0 ]. (5.2) Proof. Conditioning X = x 0, Y follows a finite mixtures of normals. Thus, (5.1) holds by some calculations. Furthermore, (5.2) follows by using under regularity conditions C, F and the arguments in Page 39 of McLachlan and Peel (2000) together. This completes the proof of this lemma. We refer (5.1 and (5.2) to as the local Barlette s first and second identities, respectively. (5.1) implies that Λ(x 0 ) = 0. Proof of Theorem 1. Denote m c = nh{m c0 m c (x 0 )}, σ 2 c = nh{σ 2 c0 σ 2 c (x 0 )}, π c = nh{π c0 π c (x 0 )}, πc = nh{π C0 π C (x 0 )} = C 1 nh[1 {π c0 π c (x 0 )}]. Let m = (m 1,, m C )T, σ 2 = (σ1 2, σ2 C )T, and π = (π1,, π C 1 )T. Denote θ = (π T, σ 2 T, m T ) T. Recall that { C } l(θ(x 0 ), y) = log η{y θ(x 0 )} = log π c (x 0 )φ{y m c (x 0 ), σc 2 (x 0 )}. c=1 c=1
19 18 MIAN HUANG AND RUNZE LI Let { C l(θ(x 0 ) + γ n θ, y) = log (π c (x 0 ) + γ n πc ) c=1 φ ( y m c (x 0 ) + γ n m c, σc 2 (x 0 ) + γ n σc 2 ) }. Thus, if {ˆπ, ˆσ 2, ˆm} maximizes (2.2), then ˆθ maximizes l n(θ ) = h {l(θ(x 0 ) + γ n θ, y i ) l(θ(x 0 ), y i )}K h (x i x 0 ). (5.3) By the Taylor expansion, l n(θ ) = n θ θ T Γ n θ + hγ3 n 6 R(θ(x 0 ), ˆξ), (5.4) where ˆξ is a vector between 0 and γ n θ, and h n = q 1 {θ(x 0 ), y i }K h (x i x 0 ), n Γ n = 1 q 2 {θ(x 0 ), y i }K h (x i x 0 ), n R(θ(x 0 ), ˆξ) = j,k,l 3 l(θ(x 0 ) + ˆξ, y i ) θ j θ k θ l K h (x i x 0 ). Denote Γ n (i, j) the (i, j) element of Γ n. By condition E, it can be shown that 2 l(θ(x 0 ), y) EΓ n (i, j) = η{y θ(x)}f(x)k h (x x 0 )dxdy Y X θ i θ j 2 l(θ(x 0 ), y) = f X (x 0 ) η{y θ(x 0 )}dy + o p (1). θ i θ j Y Therefore, EΓ n = f X (x 0 )I(x 0 ) + o p (1). Var{Γ n (i, j)} is dominated by the following term 1 n Y X { 2 l(θ(x 0 ), y) θ i θ j } 2 η{y θ(x)}f(x)k 2 h (x x 0)dxdy, which can be shown to have the order O p {(nh) 1 } under condition F. Therefore, we have Γ n = f X (x 0 )I(x 0 ) + o p (1).
20 NONPARAMETRIC MIXTURE OF REGRESSIONS 19 By Condition C, the expectation of the absolute value of the last term of (5.4) is bounded by ( O γ n E max j,k,l 3 l(θ(x 0 ) + ˆξ, ) Y ) K h (x i x 0 ) = O(γ n ). (5.5) θ j θ k θ l Thus, the last term of (5.4) is of order O p (γ n ). Therefore, we have l n(θ ) = n θ 1 2 f(x 0)θ T I(x 0 )θ + o p (1). (5.6) Using the quadratic approximation lemma (for example, see p.210 of Fan and Gijbels, 1996), we have ˆθ = f(x 0 ) 1 I(x 0 ) 1 n + o p (1). (5.7) To establish asymptotic normality, it remains to calculate the mean and variance of n, and verify the Lyapounov condition. Note that E( n ) = nh q 1 {θ(x 0 ), y}η{y θ(x)}f(x)k h (x x 0 )dxdy Y X = nh Λ(x)f(x)K h (x x 0 )dx. X Under Conditions C, D and F, Λ(x) has continuous second derivative. Thus, using the fact Λ(x 0 ) = 0 by Lemma 1 and standard arguments in the kernel regression, it follows that { f (x 0 )Λ (x 0 ) E( n ) = f(x 0 ) For the covariance term of n, we have + 1 } 2 Λ (x 0 ) κ 2 h 2 + o(h 2 ). Cov( n ) = he { q 1 {θ(x 0 ), Y }q T 1 {θ(x 0 ), Y }K 2 h (X x 0) } + o p (1), where its (i, j) element is l(θ(x 0 ), y) l(θ(x 0 ), y) h Kh 2 Y X θ i θ (x x 0)f(x)η{y θ(x)}dxdy j P l(θ(x 0 ), y) l(θ(x 0 ), y) f(x 0 )ν 0 η{y θ(x 0 )}dy Y θ i θ j 2 l(θ(x 0 ), y) = f(x 0 )ν 0 η{y θ(x 0 )}dy. θ i θ j Y
21 20 MIAN HUANG AND RUNZE LI The last step holds due to (5.2). Thus, Cov( n ) = f(x 0 )I(x 0 )ν 0 + o p (1). In order to establish asymptotic normality for n, it is necessary to show for any unit vector d, {d T Cov( n )d} 1/2 d T { n E( n )} D N(0, 1). (5.8) Since Cov( n ) = O p (1), it follows that {d T Cov( n )d} = O p (1). Let λ i = d T q 1 {θ(x 0 ), y i }K h (x i x 0 ), then d T n = hγ n n λ i. Then, it is sufficient to show that nh 3 γ 3 ne( λ i 3 ) = o p (1). By condition F and arguments similar to (5.5), it can be shown that nh 3 γ 3 ne( λ i 3 ) = O p (γ n ) = o p (1), and thus Lyapounov s condition holds for (5.8). By (5.7) and the Slutsky theorem, we have nh{γnˆθ B(x0 ) + o p (h 2 )} D N { 0, f 1 (x 0 )ν 0 I 1 (x 0 ) }. (5.9) Proof of Theorem 2 We assume the unobserved data (C i, i = 1,, n) are random samples from population C, and the complete data {(x i, y i, C i ), i = 1, 2,, n} are random samples from (X, Y, C). Let h{y, C θ(x)} be the joint pdf of (Y, C) given X = x, and f(x) be the marginal density of X. Conditioning on X = x, Y follows a distribution η{y θ(x)}. Then the conditional density of C is given by g{c y, θ(x)} = h{y, c θ(x)} η{y θ(x)}. (5.10) The local log-likelihood function (2.2) can be re-written as l n (θ 0 ) = log{η(y i θ 0 )}K h (x i x 0 ). (5.11) Given a fixed ˆθ (l) (x i ), i = 1,, n, it is obvious that g{c y i, ˆθ (l) (x i )}dc = 1. Then local likelihood (5.11) is { l n (θ) = log{η(y i θ)} = { By (5.10), we also have } g{c y i, ˆθ (l) (x i )}dc K h (x i x 0 ) } log{η(y i θ)}g{c y i, ˆθ (l) (x i )}dc K h (x i x 0 ). (5.12) log{η(y i θ)} = log{h(y i, c θ)} log{g(c y i, θ)}. (5.13)
22 NONPARAMETRIC MIXTURE OF REGRESSIONS 21 Thus, we have l n (θ) = E{log{h(y i, C θ)} θ (l) (x i )}K h (x i x 0 ) E{log{g(C y i, θ)} θ (l) (x i )}K h (x i x 0 ), (5.14) where θ (l) (x i ) is the l-th step local estimations at x i. Taking the expectation is equivalent to calculating (2.3). In M step, we choose θ (l+1) (x 0 ) such that 1 n 1 n { } E log[h{y i, C θ (l+1) (x 0 )}] θ (l) (x i ) K h (x i x 0 ) { } E log[h{y i, C θ (l) (x 0 )}] θ (l) (x i ) K h (x i x 0 ). It suffices to show that [ { } 1 ] g{c y i, θ (l+1) (x 0 )} lim inf E log n n g{c y i, θ (l) (x 0 )} θ(l) (x i ) K h (x i x 0 ) 0 (5.15) in probability. Denote [ { } L g = 1 ] g{c y i, θ (l+1) (x 0 )} E log n g{c y i, θ (l) (x 0 )} θ(l) (x i ) K h (x i x 0 ), and L J = 1 n [ { g{c y i, θ (l+1) }] (x 0 )} log E g{c y i, θ (l) (x 0 )} θ(l) (x i ) K h (x i x 0 ). By Jensen s inequality, L g L J. Next we show that L J 0 in probability. To this end, we first calculate the expectation of L J. ( [ ] ) g{c Y, θ (l+1) (x 0 )} E(L J ) = E log g{c Y, θ (l) (x 0 )} g{c Y, θ(l) (X)}dc K h (X x 0 ), C which tends to 0 by a standard argument. We next calculate the variance of L J. Note that the variance of L J is dominated by the following term ( [ ] 2 1 n E g{c Y, θ (l+1) (x 0 )} log g{c Y, θ (l) (x 0 )} g{c Y, θ(l) (X)}dc K h (X x 0 )), C
23 22 MIAN HUANG AND RUNZE LI which can be shown to have the order O p {(nh) 1 }. Then we have L J = o p (1) by Chebyshev inequality. This completes the proof. Acknowledgments Huang s research is supported by National Science Foundation grants DMS and National Institute on Drug Abuse grants P50 DA10075 as research assistant during his graduate study at the Pennsylvania State University. Li s research is supported by National Institute on Drug Abuse grants R21 DA The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIDA or the NIH. References Cheng, M. Y., Fan, J. and Marron, J. S. (1997). On Automatic Boundary Corrections. Annals of Statistics. 25, Dempster, A. P., Laird, N. M. and Rubin, B. D. (1977). Maximum Likelihood from Incomplete Data Via EM Algorithm. Journal of American and Statisical Association. 39, Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. Chapmam and Hall, London. Fan, J. (1993). Local Linear Regression Smoothers and Their Minimax efficiencies. Annals of Statistics. 21, Fan, J. and Gijbels, I. (1992). Variable Bandwidth and Local Linear Regression Smoothers. Annals of Statistics. 20, Fan, J. and Yao, Q. (1998). Efficient Estimation of Conditional Variance Functions in Stochastic Regression. Biometrika. 85, Frühwirth-Schnatter, S. (2001). Markov Chain Monte Carlo Estimation of Classical and Dynamic Switching and Mixture Models. Journal of American and Statisical Association. 96, Frühwirth-Schnatter, S. (2005). Finite Mixture and Markov Switching Models. Springer.
24 NONPARAMETRIC MIXTURE OF REGRESSIONS 23 Green, P. J. and Richardson, S. (2002). Hidden Markov Models and Disease Mapping. Journal of American and Statisical Association. 97, Goldfeld, S. M. and Quandt, R. E. (1976). A Markov Model for Swiching Regression. Journal of Econometrics. 1, Hathaway, R. J. (1985). A Constrained Formulation of Maximum-likelihood Estimation for Normal Mixture Distributions. Annals of Statistics. 13, Hennig, C. (2000). Identifiability of Models for Clusterwise Linear Regression. Journal of Classification. 17, Lindsay, B. G. (1995). Mixture Models: Theory, Geometry, and Applications. Hayward, Institute of Mathematical Statistics. McLachlan, G. J. and Peel, D. (2000). Finite Mixture Models. Wiley, New York. Ruppert, D., Sheather, S. J. and Wand, M. P. (1995). An Effective Bandwidth Selector for Local Least Squares Regression. Journal of American and Statisical Association. 90, Rossi, P. E., Allenby, G. M., and McCulloch, R. (2005). Bayesian Statistics and Marketing. Chichester: Wiley. Wang, P., Puterman, M. L., Cockburn, I., and Le, N. (1996). Mixed Poisson Regression Models with Covariate Dependent Rates. Biometrics. 52, Wedel, M. and DeSarbo, W. S. (1993). A Latent Class Binomial Logit Methodology for the Analysis of Paired Comparison Data. Decision Sciences. 24, Department of Statistics, Shanghai University of Finance and Economics, Shanghai, China, ; mzh136@stat.psu.edu Department of Statistics and The Methodology Center, The Pennsylvania State University, University Park, Pennsylvania, USA, ; rli@stat.psu.edu
The EM Algorithm for the Finite Mixture of Exponential Distribution Models
Int. J. Contemp. Math. Sciences, Vol. 9, 2014, no. 2, 57-64 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ijcms.2014.312133 The EM Algorithm for the Finite Mixture of Exponential Distribution
More informationMixture of Regression Models with Varying Mixing Proportions: A Semiparametric Approach
Mixture of Regression Models with Varying Mixing Proportions: A Semiparametric Approach Mian Huang and Weixin Yao Abstract In this paper, we study a class of semiparametric mixtures of regression models,
More informationFunction of Longitudinal Data
New Local Estimation Procedure for Nonparametric Regression Function of Longitudinal Data Weixin Yao and Runze Li Abstract This paper develops a new estimation of nonparametric regression functions for
More informationNew Local Estimation Procedure for Nonparametric Regression Function of Longitudinal Data
ew Local Estimation Procedure for onparametric Regression Function of Longitudinal Data Weixin Yao and Runze Li The Pennsylvania State University Technical Report Series #0-03 College of Health and Human
More informationNonparametric Modal Regression
Nonparametric Modal Regression Summary In this article, we propose a new nonparametric modal regression model, which aims to estimate the mode of the conditional density of Y given predictors X. The nonparametric
More informationLocal Modal Regression
Local Modal Regression Weixin Yao Department of Statistics, Kansas State University, Manhattan, Kansas 66506, U.S.A. wxyao@ksu.edu Bruce G. Lindsay and Runze Li Department of Statistics, The Pennsylvania
More informationMixture of Gaussian Processes and its Applications
Mixture of Gaussian Processes and its Applications Mian Huang, Runze Li, Hansheng Wang, and Weixin Yao The Pennsylvania State University Technical Report Series #10-102 College of Health and Human Development
More informationWEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract
Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of
More informationSome Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model
Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model 1. Introduction Varying-coefficient partially linear model (Zhang, Lee, and Song, 2002; Xia, Zhang, and Tong, 2004;
More informationLabel Switching and Its Simple Solutions for Frequentist Mixture Models
Label Switching and Its Simple Solutions for Frequentist Mixture Models Weixin Yao Department of Statistics, Kansas State University, Manhattan, Kansas 66506, U.S.A. wxyao@ksu.edu Abstract The label switching
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationWeb-based Supplementary Material for. Dependence Calibration in Conditional Copulas: A Nonparametric Approach
1 Web-based Supplementary Material for Dependence Calibration in Conditional Copulas: A Nonparametric Approach Elif F. Acar, Radu V. Craiu, and Fang Yao Web Appendix A: Technical Details The score and
More informationLocal linear multiple regression with variable. bandwidth in the presence of heteroscedasticity
Local linear multiple regression with variable bandwidth in the presence of heteroscedasticity Azhong Ye 1 Rob J Hyndman 2 Zinai Li 3 23 January 2006 Abstract: We present local linear estimator with variable
More informationUniversity, Tempe, Arizona, USA b Department of Mathematics and Statistics, University of New. Mexico, Albuquerque, New Mexico, USA
This article was downloaded by: [University of New Mexico] On: 27 September 2012, At: 22:13 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered
More informationAsymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands
Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department
More informationEconometrics I, Estimation
Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the
More informationEstimating Mixture of Gaussian Processes by Kernel Smoothing
This is the author s final, peer-reviewed manuscript as accepted for publication. The publisher-formatted version may be available through the publisher s web site or your institution s library. Estimating
More informationMaximum Smoothed Likelihood for Multivariate Nonparametric Mixtures
Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures David Hunter Pennsylvania State University, USA Joint work with: Tom Hettmansperger, Hoben Thomas, Didier Chauveau, Pierre Vandekerkhove,
More informationLocal Polynomial Regression
VI Local Polynomial Regression (1) Global polynomial regression We observe random pairs (X 1, Y 1 ),, (X n, Y n ) where (X 1, Y 1 ),, (X n, Y n ) iid (X, Y ). We want to estimate m(x) = E(Y X = x) based
More informationTime Series and Forecasting Lecture 4 NonLinear Time Series
Time Series and Forecasting Lecture 4 NonLinear Time Series Bruce E. Hansen Summer School in Economics and Econometrics University of Crete July 23-27, 2012 Bruce Hansen (University of Wisconsin) Foundations
More informationOutline. Clustering. Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models
Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models Collaboration with Rudolf Winter-Ebmer, Department of Economics, Johannes Kepler University
More informationDESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA
Statistica Sinica 18(2008), 515-534 DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA Kani Chen 1, Jianqing Fan 2 and Zhezhen Jin 3 1 Hong Kong University of Science and Technology,
More informationMinimum Hellinger Distance Estimation in a. Semiparametric Mixture Model
Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.
More informationSEMIPARAMETRIC ESTIMATION OF CONDITIONAL HETEROSCEDASTICITY VIA SINGLE-INDEX MODELING
Statistica Sinica 3 (013), 135-155 doi:http://dx.doi.org/10.5705/ss.01.075 SEMIPARAMERIC ESIMAION OF CONDIIONAL HEEROSCEDASICIY VIA SINGLE-INDEX MODELING Liping Zhu, Yuexiao Dong and Runze Li Shanghai
More informationModelling Non-linear and Non-stationary Time Series
Modelling Non-linear and Non-stationary Time Series Chapter 2: Non-parametric methods Henrik Madsen Advanced Time Series Analysis September 206 Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September
More informationA Bootstrap Test for Conditional Symmetry
ANNALS OF ECONOMICS AND FINANCE 6, 51 61 005) A Bootstrap Test for Conditional Symmetry Liangjun Su Guanghua School of Management, Peking University E-mail: lsu@gsm.pku.edu.cn and Sainan Jin Guanghua School
More informationNonparametric Regression Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction
Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction Tine Buch-Kromann Univariate Kernel Regression The relationship between two variables, X and Y where m(
More informationExtending clustered point process-based rainfall models to a non-stationary climate
Extending clustered point process-based rainfall models to a non-stationary climate Jo Kaczmarska 1, 2 Valerie Isham 2 Paul Northrop 2 1 Risk Management Solutions 2 Department of Statistical Science, University
More informationDensity estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas
0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity
More informationSEMIPARAMETRIC MIXTURE OF BINOMIAL REGRESSION WITH A DEGENERATE COMPONENT
Statistica Sinica 22 (2012), 27-46 doi:http://dx.doi.org/10.5705/ss.2010.062 SEMIPARAMETRIC MIXTURE OF BINOMIAL REGRESSION WITH A DEGENERATE COMPONENT J. Cao and W. Yao Simon Fraser University and Kansas
More informationCommunications in Statistics - Simulation and Computation. Comparison of EM and SEM Algorithms in Poisson Regression Models: a simulation study
Comparison of EM and SEM Algorithms in Poisson Regression Models: a simulation study Journal: Manuscript ID: LSSP-00-0.R Manuscript Type: Original Paper Date Submitted by the Author: -May-0 Complete List
More informationMinimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model.
Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model By Michael Levine Purdue University Technical Report #14-03 Department of
More informationNonparametric Methods
Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis
More informationComputation of an efficient and robust estimator in a semiparametric mixture model
Journal of Statistical Computation and Simulation ISSN: 0094-9655 (Print) 1563-5163 (Online) Journal homepage: http://www.tandfonline.com/loi/gscs20 Computation of an efficient and robust estimator in
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationVariance Function Estimation in Multivariate Nonparametric Regression
Variance Function Estimation in Multivariate Nonparametric Regression T. Tony Cai 1, Michael Levine Lie Wang 1 Abstract Variance function estimation in multivariate nonparametric regression is considered
More informationLocal Polynomial Modelling and Its Applications
Local Polynomial Modelling and Its Applications J. Fan Department of Statistics University of North Carolina Chapel Hill, USA and I. Gijbels Institute of Statistics Catholic University oflouvain Louvain-la-Neuve,
More informationEM Algorithm II. September 11, 2018
EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data
More informationCS 195-5: Machine Learning Problem Set 1
CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of
More informationSupplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION. September 2017
Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION By Degui Li, Peter C. B. Phillips, and Jiti Gao September 017 COWLES FOUNDATION DISCUSSION PAPER NO.
More informationCovariance function estimation in Gaussian process regression
Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian
More informationPreface. 1 Nonparametric Density Estimation and Testing. 1.1 Introduction. 1.2 Univariate Density Estimation
Preface Nonparametric econometrics has become one of the most important sub-fields in modern econometrics. The primary goal of this lecture note is to introduce various nonparametric and semiparametric
More informationSEMIPARAMETRIC QUANTILE REGRESSION WITH HIGH-DIMENSIONAL COVARIATES. Liping Zhu, Mian Huang, & Runze Li. The Pennsylvania State University
SEMIPARAMETRIC QUANTILE REGRESSION WITH HIGH-DIMENSIONAL COVARIATES Liping Zhu, Mian Huang, & Runze Li The Pennsylvania State University Technical Report Series #10-104 College of Health and Human Development
More informationOptimal Estimation of a Nonsmooth Functional
Optimal Estimation of a Nonsmooth Functional T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania http://stat.wharton.upenn.edu/ tcai Joint work with Mark Low 1 Question Suppose
More informationSEMIPARAMETRIC MIXTURE OF BINOMIAL REGRESSION WITH A DEGENERATE COMPONENT. J. Cao and W. Yao. Simon Fraser University and Kansas State University
SEMIPARAMETRIC MIXTURE OF BINOMIAL REGRESSION WITH A DEGENERATE COMPONENT J. Cao and W. Yao Simon Fraser University and Kansas State University Abstract: Many datasets contain a large number of zeros,
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationEconomics 583: Econometric Theory I A Primer on Asymptotics
Economics 583: Econometric Theory I A Primer on Asymptotics Eric Zivot January 14, 2013 The two main concepts in asymptotic theory that we will use are Consistency Asymptotic Normality Intuition consistency:
More informationLOCAL LINEAR REGRESSION FOR GENERALIZED LINEAR MODELS WITH MISSING DATA
The Annals of Statistics 1998, Vol. 26, No. 3, 1028 1050 LOCAL LINEAR REGRESSION FOR GENERALIZED LINEAR MODELS WITH MISSING DATA By C. Y. Wang, 1 Suojin Wang, 2 Roberto G. Gutierrez and R. J. Carroll 3
More informationStatistics & Data Sciences: First Year Prelim Exam May 2018
Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book
More informationOn Mixture Regression Shrinkage and Selection via the MR-LASSO
On Mixture Regression Shrinage and Selection via the MR-LASSO Ronghua Luo, Hansheng Wang, and Chih-Ling Tsai Guanghua School of Management, Peing University & Graduate School of Management, University
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of
More informationReview and continuation from last week Properties of MLEs
Review and continuation from last week Properties of MLEs As we have mentioned, MLEs have a nice intuitive property, and as we have seen, they have a certain equivariance property. We will see later that
More informationIndependent and conditionally independent counterfactual distributions
Independent and conditionally independent counterfactual distributions Marcin Wolski European Investment Bank M.Wolski@eib.org Society for Nonlinear Dynamics and Econometrics Tokyo March 19, 2018 Views
More informationNonparametric Regression
Nonparametric Regression Econ 674 Purdue University April 8, 2009 Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 1 / 31 Consider the univariate nonparametric regression model: where y
More informationIntroduction. Chapter 1
Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics
More informationSMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES
Statistica Sinica 19 (2009), 71-81 SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Song Xi Chen 1,2 and Chiu Min Wong 3 1 Iowa State University, 2 Peking University and
More informationPerformance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project
Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore
More informationSimple and Efficient Improvements of Multivariate Local Linear Regression
Journal of Multivariate Analysis Simple and Efficient Improvements of Multivariate Local Linear Regression Ming-Yen Cheng 1 and Liang Peng Abstract This paper studies improvements of multivariate local
More informationEstimating the parameters of hidden binomial trials by the EM algorithm
Hacettepe Journal of Mathematics and Statistics Volume 43 (5) (2014), 885 890 Estimating the parameters of hidden binomial trials by the EM algorithm Degang Zhu Received 02 : 09 : 2013 : Accepted 02 :
More informationLecture Notes 15 Prediction Chapters 13, 22, 20.4.
Lecture Notes 15 Prediction Chapters 13, 22, 20.4. 1 Introduction Prediction is covered in detail in 36-707, 36-701, 36-715, 10/36-702. Here, we will just give an introduction. We observe training data
More informationNonparametric Density Estimation
Nonparametric Density Estimation Econ 690 Purdue University Justin L. Tobias (Purdue) Nonparametric Density Estimation 1 / 29 Density Estimation Suppose that you had some data, say on wages, and you wanted
More informationSupplement to Quantile-Based Nonparametric Inference for First-Price Auctions
Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Vadim Marmer University of British Columbia Artyom Shneyerov CIRANO, CIREQ, and Concordia University August 30, 2010 Abstract
More informationMath 494: Mathematical Statistics
Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/
More informationWeb Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.
Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we
More informationSelection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty
Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More information12 - Nonparametric Density Estimation
ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6
More informationLocal linear multivariate. regression with variable. bandwidth in the presence of. heteroscedasticity
Model ISSN 1440-771X Department of Econometrics and Business Statistics http://www.buseco.monash.edu.au/depts/ebs/pubs/wpapers/ Local linear multivariate regression with variable bandwidth in the presence
More informationNonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix
Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Yingying Dong and Arthur Lewbel California State University Fullerton and Boston College July 2010 Abstract
More informationOn a Nonparametric Notion of Residual and its Applications
On a Nonparametric Notion of Residual and its Applications Bodhisattva Sen and Gábor Székely arxiv:1409.3886v1 [stat.me] 12 Sep 2014 Columbia University and National Science Foundation September 16, 2014
More informationStatistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More informationEstimation of cumulative distribution function with spline functions
INTERNATIONAL JOURNAL OF ECONOMICS AND STATISTICS Volume 5, 017 Estimation of cumulative distribution function with functions Akhlitdin Nizamitdinov, Aladdin Shamilov Abstract The estimation of the cumulative
More informationModel Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao
Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics Jiti Gao Department of Statistics School of Mathematics and Statistics The University of Western Australia Crawley
More informationLasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices
Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,
More informationFinite Population Sampling and Inference
Finite Population Sampling and Inference A Prediction Approach RICHARD VALLIANT ALAN H. DORFMAN RICHARD M. ROYALL A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane
More informationIntroduction An approximated EM algorithm Simulation studies Discussion
1 / 33 An Approximated Expectation-Maximization Algorithm for Analysis of Data with Missing Values Gong Tang Department of Biostatistics, GSPH University of Pittsburgh NISS Workshop on Nonignorable Nonresponse
More informationStatistical Estimation
Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from
More informationAn Extended BIC for Model Selection
An Extended BIC for Model Selection at the JSM meeting 2007 - Salt Lake City Surajit Ray Boston University (Dept of Mathematics and Statistics) Joint work with James Berger, Duke University; Susie Bayarri,
More informationEstimation, Inference, and Hypothesis Testing
Chapter 2 Estimation, Inference, and Hypothesis Testing Note: The primary reference for these notes is Ch. 7 and 8 of Casella & Berger 2. This text may be challenging if new to this topic and Ch. 7 of
More informationA Simple Solution to Bayesian Mixture Labeling
Communications in Statistics - Simulation and Computation ISSN: 0361-0918 (Print) 1532-4141 (Online) Journal homepage: http://www.tandfonline.com/loi/lssp20 A Simple Solution to Bayesian Mixture Labeling
More informationSmooth simultaneous confidence bands for cumulative distribution functions
Journal of Nonparametric Statistics, 2013 Vol. 25, No. 2, 395 407, http://dx.doi.org/10.1080/10485252.2012.759219 Smooth simultaneous confidence bands for cumulative distribution functions Jiangyan Wang
More informationStat 451 Lecture Notes Numerical Integration
Stat 451 Lecture Notes 03 12 Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2 Updated: February 11, 2016 1 / 29
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationIntroduction. Linear Regression. coefficient estimates for the wage equation: E(Y X) = X 1 β X d β d = X β
Introduction - Introduction -2 Introduction Linear Regression E(Y X) = X β +...+X d β d = X β Example: Wage equation Y = log wages, X = schooling (measured in years), labor market experience (measured
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationSparse Nonparametric Density Estimation in High Dimensions Using the Rodeo
Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Han Liu John Lafferty Larry Wasserman Statistics Department Computer Science Department Machine Learning Department Carnegie Mellon
More informationEcon 582 Nonparametric Regression
Econ 582 Nonparametric Regression Eric Zivot May 28, 2013 Nonparametric Regression Sofarwehaveonlyconsideredlinearregressionmodels = x 0 β + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β The assume
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationJ. Cao and W. Yao. Simon Fraser University and Kansas State University
SEMIPARAMETRIC MIXTURE OF BINOMIAL REGRESSION WITH A DEGENERATE COMPONENT J. Cao and W. Yao Simon Fraser University and Kansas State University Abstract: Many historical datasets contain a large number
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationOptimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.
Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may
More informationTest for Discontinuities in Nonparametric Regression
Communications of the Korean Statistical Society Vol. 15, No. 5, 2008, pp. 709 717 Test for Discontinuities in Nonparametric Regression Dongryeon Park 1) Abstract The difference of two one-sided kernel
More informationOnline appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US
Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US Gerdie Everaert 1, Lorenzo Pozzi 2, and Ruben Schoonackers 3 1 Ghent University & SHERPPA 2 Erasmus
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationf(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain
0.1. INTRODUCTION 1 0.1 Introduction R. A. Fisher, a pioneer in the development of mathematical statistics, introduced a measure of the amount of information contained in an observaton from f(x θ). Fisher
More informationBAYESIAN DECISION THEORY
Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will
More informationObtaining Critical Values for Test of Markov Regime Switching
University of California, Santa Barbara From the SelectedWorks of Douglas G. Steigerwald November 1, 01 Obtaining Critical Values for Test of Markov Regime Switching Douglas G Steigerwald, University of
More informationModel-based cluster analysis: a Defence. Gilles Celeux Inria Futurs
Model-based cluster analysis: a Defence Gilles Celeux Inria Futurs Model-based cluster analysis Model-based clustering (MBC) consists of assuming that the data come from a source with several subpopulations.
More informationTesting Statistical Hypotheses
E.L. Lehmann Joseph P. Romano Testing Statistical Hypotheses Third Edition 4y Springer Preface vii I Small-Sample Theory 1 1 The General Decision Problem 3 1.1 Statistical Inference and Statistical Decisions
More information