NONPARAMETRIC MIXTURE OF REGRESSION MODELS

Size: px
Start display at page:

Download "NONPARAMETRIC MIXTURE OF REGRESSION MODELS"

Transcription

1 NONPARAMETRIC MIXTURE OF REGRESSION MODELS Huang, M., & Li, R. The Pennsylvania State University Technical Report Series #09-93 College of Health and Human Development The Pennsylvania State University

2 1 NONPARAMETRIC MIXTURE OF REGRESSION MODELS Mian Huang Department of Statistics Shanghai University of Finance and Economics Shanghai, , P. R. China Runze Li Department of Statistics and The Methodology Center The Pennsylvania State University University Park, PA, , USA Abstract: Motivated by an analysis of US house price index data, we propose nonparametric finite mixture of regression models. We develop an estimation procedure of the newly proposed models using EM algorithm and kernel regression. The asymptotic property of the proposed estimate are systematically studied. The asymptotic bias and variance of the resulting estimate are derived. We further establish the asymptotic normality of the proposed estimate. Monte Carlo simulations are conducted to examine the finite sample performance of the proposed estimation procedures. An empirical analysis of the US house price index data is illustrated for the proposed methodology. Key words and phrases: EM algorithm, kernel regression, mixture of regression models, nonparametric regression. 1. Introduction Mixture models have been widely used in econometrics and social science, and the theories for mixture models have been well studied (Lindsay, 1995). As a class of useful models of the mixture models, finite mixture of linear regression models has been applied in various fields in the literature since its introduction in Goldfeld and Quandt (1976). For example, there are applications in econometrics and marketing (DeSarbo, 1993; Frühwirth-Schnatter, 2001; Rossi et al., 2005), in epidemiology (Green and Richardson, 2002), and in biology (Wang et al., 1996). Bayesian approaches for mixture regression models are summarized in Frühwirth-Schnatter (2005). Many efforts have been made to these models

3 2 MIAN HUANG AND RUNZE LI and their extensions such as finite mixture of generalized linear models, and comprehensively summarized in McLachlan and Peel (2000). Motivated by an analysis of US house price index data in section 3.4, we propose nonparametric finite mixtures of regression models. Compared with finite mixture of linear regression models, the newly proposed models relax the linearity assumption on the regression function, and allow that the regression function in each component is an unknown but smooth function of its covariates. In this paper, we consider the situation in which the mixing proportion, the mean functions and the variance functions are all nonparametric. Using a kernel regression technique, we develop an estimation procedure for the unknown functions in the nonparametric mixture of regression model via local likelihood approach. The sampling properties of the proposed estimation procedure are investigated. We derive the asymptotic bias and variance of the local likelihood estimate, and establish its asymptotic normality. For the estimation procedure, one may naively implement an EM algorithm (Dempster et al., 1977) for maximizing the local likelihood function. However, it is desired to estimate the whole curves by evaluating the resulting estimate over a set of grid points. The naive implementation of the EM algorithm does not ensure that the component labels will match correctly at different grid points. This is similar to the label switching problem in older applications of mixture modeling. We modify the EM algorithm to simultaneously maximize the local likelihood functions for the proposed nonparametric mixture of regression models at the set of grid points. The modified EM algorithm works well in our simulation and in a real data example. We further study the ascent property of the proposed EM algorithm. We derive a standard error formula for the resulting estimate by the conventional sandwich formula. A bandwidth selector is proposed for the local likelihood estimate using a multi-fold cross-validation method. A simulation study is conducted to examine the performance of the proposed procedure and test the accuracy of the proposed standard error formula. We further demonstrate the proposed model and estimation procedure by an empirical analysis of US house price index data. The rest of this paper is structured as follows. In section 2, we present

4 NONPARAMETRIC MIXTURE OF REGRESSIONS 3 the nonparametric finite mixture of regression models, and derive asymptotic property for the local likelihood estimate. We further develop an estimation procedure for the proposed model using EM algorithm and kernel regression. Simulation results and an empirical analysis of a real data are presented in section 3. Some discussions are provided in section 4. Technical conditions and proofs are given in section Estimation Procedure and its Sampling Properties 2.1. Model Definition Suppose that {(x i, y i ), i = 1,, n} are random samples from the population (X, Y ). Throughout this paper, we assume X is univariate. The proposed methodology and theoretical results can be extended to multivariate X, but the extension is less useful due to the curse of dimensionality. Let C be a latent class variable, and assume that conditioning on X = x, C has a discrete distribution P (C = c X = x) = π c (x) for c = 1, 2,, C. Conditioning on C = c and X = x, Y follows a normal distribution with mean m c (x) and variance σc 2 (x), where m c ( ) and σc 2 ( ) are unknown but smooth functions. In other words, conditioning on X = x, the response variable Y follows a finite mixture of normals C π c (x)n{m c (x), σc 2 (x)}. (2.1) c=1 In this paper, we assume that C is fixed, and refer to model (2.1) as a nonparametric finite mixture of regression models because π c ( ), m c ( ) and σc 2 ( ) are nonparametric. We will discuss how to determine C in section 3. When π c (x) and σc 2 (x) are constant, and m c (x) is linear in x, model (2.1) reduces to a finite mixture of linear regression models (Goldfeld and Quandt, 1976). When C = 1, model (2.1) is a nonparametric regression model. See Chapter 2 of Fan and Gijbels (1996) for a review of nonparametric regression model. Thus, model (2.1) can be regarded as a natural extension of nonparametric models and finite mixture of linear regression models. It is easy to adapt the conventional constraints imposed on the finite mixture of linear regression models for the proposed model so that it is identifiable, and the corresponding likelihood function is bounded. For further references on these issues, see Hening (2000) and Hathaway (1985). Since π c ( ), m c ( ) and σc 2 ( ) are nonparametric, we need nonparametric smooth-

5 4 MIAN HUANG AND RUNZE LI ing techniques for (2.1). In this paper, we will employ kernel regression for model (2.1). Suppose we want to estimate the unknown functions at x 0. In kernel regression, we first use local constants π c0, σ 2 c0, and m c0 to approximate π c (x 0 ), σ 2 c (x 0 ), and m c (x 0 ). function K( ) with a bandwidth h. Let K h ( ) = h 1 K( /h) be a rescaled kernel of a kernel Furthermore, denote φ(y µ, σ 2 ) to be the density function N(µ, σ 2 ). Then, the corresponding local log-likelihood function for data {(x i, y i ), i = 1, 2,, n} is { C } l n (π 0, σ 2 0, m 0 ) = log π c0 φ(y i m c0, σc0) 2 K h (x i x 0 ), (2.2) c=1 where m 0 = (m 10,, m C0 ) T, σ 2 0 = (σ2 10,, σ2 C0 )T, π 0 = (π 10,, π C 1,0 ) T, and π C0 = 1 C 1 c=1 π c0. One may also apply local linear regression or local polynomial regression techniques for estimation of π c (x), m c (x), and σ 2 c (x). Local linear regression has several nice statistical properties (Fan and Gijbels, 1996). However, in our model setting local linear regression does not yield a closed-form solution for variance functions and mixing proportion functions in the M-step of the proposed EM algorithm. See (2.7), (2.8), and (2.9) for more details. Thus, we use kernel regression to estimate the nonparametric functions. Let {ˆπ, ˆσ 2, ˆm} be the maximizer of the local likelihood function (2.2). Then the estimates for π c (x 0 ), σ 2 c (x 0 ), and m c (x 0 ) are ˆπ c (x 0 ) = ˆπ c0, ˆσ 2 c (x 0 ) = ˆσ 2 c0, and ˆm c (x 0 ) = ˆm c Asymptotic Properties We next study the asymptotic properties of ˆπ c (x 0 ), ˆσ 2 c (x 0 ) and ˆm c (x 0 ). Let θ = (π T, σ 2T, m T ) T, and denote η(y θ) = C π c φ { y m c, σc 2 }, and l(θ, y) = log η(y θ). c=1 Let θ(x 0 ) = {π T (x 0 ), σ 2 (x 0 ) T, m(x 0 ) T } T, and define η 1 {θ(x), y} = η{y θ(x)}, η 2 {θ(x), y} = 2 η{y θ(x)} θ θ θ T, l{θ(x), y} q 1 {θ(x), y} =, q 2 {θ(x), y} = 2 l{θ(x), y} θ θ θ T, I(x 0 ) = E[q 2 {θ(x), Y } X = x 0 ], and Λ(x) = q 1 {θ(x 0 ), y}.η{y θ(x)}dy Y

6 Denote γ n = (nh) 1/2, NONPARAMETRIC MIXTURE OF REGRESSIONS 5 ˆm c = nh{ ˆm c0 m c (x 0 )}, ˆσ 2 c = nh{ˆσ 2 c0 σ 2 c (x 0 )}, ˆπ c = nh{ˆπ c0 π c (x 0 )}, ˆπ C = nh{ˆπ C0 π C (x 0 )} = C 1 nh[1 {ˆπ c0 π c (x 0 )}]. Let ˆm = ( ˆm 1,, ˆm C )T, ˆσ 2 = (ˆσ 2 1, ˆσ2 C )T, and ˆπ = (ˆπ 1,, ˆπ C 1 )T, and ˆθ = {(ˆπ ) T, (ˆσ 2 ) T, ( ˆm ) T } T. The asymptotic bias, variance, and normality of the resulting estimate are given in the following theorem. The proof is given in section 5. Theorem 1 Suppose that conditions (A) (F) in section 5 hold. Then it follows that nh{γnˆθ B(x0 ) + o p (h 2 )} c=1 D N { 0, f 1 (x 0 )ν 0 I 1 (x 0 ) }, where f( ) is the marginal density function of X, { f B(x 0 ) = I 1 (x 0 )Λ (x 0 ) (x 0 ) + 1 } f(x 0 ) 2 Λ (x 0 ) κ 2 h 2, κ l = u l K(u) du, and ν l = u l K 2 (u) du An Effective EM Algorithm For a given x 0, one may maximize the local likelihood function (2.2) using an EM algorithm easily. However, in practice we typically want to evaluate the unknown functions at a set of grid points over an interval of x. This requires us to maximize the local likelihood function (2.2) at different grid points. This imposes some challenges because the labels in the EM algorithm may change at different grid points, which is similar to the label switching problem which occurs when one conducts a bootstrap to get a confidence interval for parameters in a mixture model. Thus, a naive implementation of the EM algorithm may fail to yield smooth estimated curves. An illustration is given in Figure 2.1, in which the solid curves are obtained when the labels at the grid points u j 1, u j and u j+1 match correctly, while the dotted curves are obtained when the labels at u j do not match with the ones in u j 1 and u j+1 correctly.

7 6 MIAN HUANG AND RUNZE LI No "label switching" "Label switching" y u(j 1) u(j) u(j+1) Figure 2.1: An illustration of mismatching labels of naive implementation of EM algorithm. Solid curves are obtained when the labels at u j 1, u j and u j+1 match correctly. Dotted curves are obtained when the labels at u j does not match the ones in u j 1 and u j+1 correctly. In this section, we propose an effective EM algorithm to deal with this issue. The key idea is that at the M-step of the EM algorithm, we update the estimated curves at all grid points for the given label in the E-step. Define Bernoulli random variables z ic = { 1, if (xi, y i ) is in the c th group, 0, otherwise. and let z i = (z i1,, z ic ) T. The complete data are {(x i, y i, z i ), i = 1, 2,, n}, and the complete log-likelihood function is c=1 C [ z ic log πc (x i ) + log φ{y i m c (x i ), σc 2 (x i )} ]. In the l-th step of the EM algorithm iteration, we have m (l) c ( ), σ 2(l) ( ), and c

8 NONPARAMETRIC MIXTURE OF REGRESSIONS 7 π (l) ( ). In the E-step, the expectation of the latent variable z ic is given by ic = π c (l) (x i )φ{y i m (l) c (x i ), σc 2(l) (x i )} C c=1 π(l) c (x i )φ{y i m (l) c (x i ), σc 2(l) (x i )}. (2.3) r (l) Let {u 1,, u N } be a set of grid points at which the unknown functions are evaluated, where N is the number of grid points. In the M-step, we maximize C c=1 r (l) [ ic log πc0 (x 0 ) + log φ{y i m c0 (x 0 ), σc0(x 2 0 )} ] K h (x i x 0 ), (2.4) for x 0 = u i, i = 1,, N. In practice, if n is not very large, one may directly set the observed {x 1,, x n } to be the grid points. In such case, N = n. and The maximization of (2.4) is equivalent to maximizing for c = 1,, C, r (l) ic log π c0(x 0 )K h (x i x 0 ), (2.5) r (l) ic log φ{y i m c0 (x 0 ), σ 2 c0(x 0 )}K h (x i x 0 ), (2.6) separately. The solution for (2.5) is, for x 0 {u j, j = 1,, N}, π (l+1) c0 (x 0 ) = n r(l) ic K h(x i x 0 ) n K. (2.7) h(x i x 0 ) The closed form solution for (2.6) is, for x 0 {u j, j = 1,, N}, where w (l) ic and σ 2(l+1) c σ 2(l+1) c0 m (l+1) c0 (x 0 ) = σ 2(l+1) c0 (x 0 ) = w (l) ic y i/ n w(l) n w(l) ic = r (l) ic K h(x i x 0 ). (x i ), i = 1,, n by linearly interpolating π c (l+1 w (l) ci, (2.8) ic {y i m (l+1) c0 (x 0 )} 2, (2.9) Furthermore, we update π c (l+1) (x i ), m (l+1) c (x i ) (u j ), m (l+1) (u j ) and (u j ), j = 1,, N, respectively. With initial values of π c ( ), m c ( ) and σ 2 ( ), the proposed estimation procedure is summarized in the following algorithm. An EM algorithm: c0

9 8 MIAN HUANG AND RUNZE LI Initial Value: Obtain an initial value for π c ( ), m c ( ) and σc 2 ( ) by conducting a mixture of polynomial regressions. Denote the initial value by π c (1) ( ), m (1) c ( ) and σc 2(1) ( ). Set l = 1. E-step: Use (2.3) to calculate r (l) ic for i = 1,, n, and c = 1,, C. M-step: For c = 1,, C and j = 1,, N, evaluate π c (l+1) (u j ) in (2.7), m (l+1) c (u j ) in (2.8) and σ 2(l+1) c0 (u j ) in (2.9). Further obtain π c (l+1) (x i ), m (l+1) c (x i ) and σc 2(l+1) (x i ) using a linear interpolation. Iteratively update the E-step and the M-step with l = 1, 2,, until the algorithm converges. It is well known that an ordinary EM algorithm for parametric models possesses an ascent property, which is a desired property. The modified EM algorithm can be regarded as an extension of the EM algorithm from parametric models to nonparametric models. Thus, it is of interest to study whether the modified EM algorithm still preserves the ascent property. Let θ (l) ( ) = {π (l) ( ), σ 2(l) ( ), m (l) ( )} be the l-th step estimated functions in the proposed EM algorithm. Denote θ 0 = (π T 0, σ2t 0, mt 0 )T. We rewrite the local log-likelihood function (2.2) as l n (θ 0 ) = l(θ 0, y i )K h (x i x 0 ). (2.10) Theorem 2 For any given point x 0 in the interval of x, suppose that θ (l) ( ) has continuous first derivative, and nh as n, h 0. It follows that [ ] lim inf n n 1 l n {θ (l+1) (x 0 )} l n {θ (l) (x 0 )} 0 (2.11) in probability. The proof of Theorem 2 is given in Section 5. Theorem 2 implies that when the sample size n is large enough, the proposed algorithm possess the ascent property at any given x 0. In particular, at the set of grid points {u 1,, u N } where the unknown curves are evaluated, we have for large n, l n {θ (l+1) (u j )} l n {θ (l) (u j )} 0.

10 NONPARAMETRIC MIXTURE OF REGRESSIONS 9 3. Simulation and Application In this section, we address some practical implementation issues such as standard error formula and bandwidth selection for nonparametric mixture of regression models. To assess the performance of the estimates of the unknown regression functions m c (x), we consider the square root of the average square errors (RASE) for mean functions, RASE 2 m = N 1 C c=1 j=1 N { ˆm c (u j ) m c (u j )} 2, where {u j, j = 1,, N} is the grid points at which the unknown function m c ( ) are evaluated. For simplification, the grid points are taken evenly on the range of the x-variable. Similarly, we can define RASE for variance functions σc 2 (x)s and proportion functions π c (x)s, denoted by RASE σ and RASE π, respectively Standard Error Formula Define the fitted value for the i-th observation as a weight sum of the estimated means, C ŷ i = r ic ˆm c (x i ), c=1 where r ic is given (2.3) when the effective EM algorithm converges. Then, the residual is e i = y i ŷ i. Rewrite the estimate of m c (x) in the proposed algorithm as ˆm c (x) = (E T W c E) 1 E T W c y, where E is a n 1 vector with all entries equal to 1, and W c = diag{w c1,, w cn } with w ic = r ic K h (x i x 0 ). We consider the following approximate standard error formula for ˆm c (x): Var{ ˆm c (x)} = (E T W c E) 1 E T W c Ĉov(y)W c E(E T W c E) 1, (3.1) where Ĉov(y) = diag{e2 1, e2 2,, e2 n}, a diagonal matrix consisting of the residuals squares e 2 i. Furthermore, (3.1) can be written as Var{ ˆm c (x)} = n w2 ic e2 i ( n w ic) 2. (3.2)

11 10 MIAN HUANG AND RUNZE LI 3.2. Bandwidth Selection Bandwidth selection is fundamental to nonparametric smoothing. In this section we propose a multifold cross-validation (CV) method to choose the bandwidth. Denote D to be the full data set. We then partition D into a training set R j and testing set T j, D = T j R j j = 1,, J. We use the training set R j to obtain the estimates { ˆm c ( ), ˆσ 2 c ( ), ˆπ c ( )}. Then we can estimate m c (x), σ 2 c (x) and π c (x) for the data points belong to the corresponding testing set. For (x l, y l ) T j, ˆm c (x l ) = {i:x i R j } r ick h (x i x l )y i {i:x i R j } r ick h (x i x l ), ˆσ 2 c (x l ) = ˆπ c (x l ) = {i:x i R j } r ick h (x i x l ){y i ˆm c (x i )} 2 {i:x i R j } r, ick h (x i x l ) {i:x i R j } r ick h (x i x l ) {i:x i R j } K h(x i x l ). Based on the estimated ˆm c (x l ) of test set T j, we again calculate the probability memberships in test set T j. For (x l, y l ) T j, c = 1,, C, r lc = ˆπ c (x l )φ{y l ˆm c (x l ), ˆσ 2 c (x l )} C q=1 ˆπ q(x l )φ{y l ˆm q (x l ), ˆσ 2 q(x l )}. Now we can implement regular CV criterion in the proposed model, J CV = (y l ŷ l ) 2, (3.3) j=1 l T j where ŷ l = C c=1 r lc ˆm c (x l ) is the predicted value of y l in the test set T j Simulation Study In the following example, we conduct a simulation for a 2-component nonparametric mixture of regressions model with π 1 (x) = exp(0.5x)/{1 + exp(0.5x)}, and π 2 (x) = 1 π 1 (x), m 1 (x) = 4 sin(2πx), and m 2 (x) = cos(3πx), σ 1 (x) = 0.6 exp(0.5x), and σ 2 (x) = 0.5 exp( 0.2x). For each of the samples, 500 simulations were conducted with sample sizes n = 200, 400, 800. The predictor x is generated from one dimension uniform distribution in [0, 1]. The Epanechnikov kernel is used in our simulation.

12 NONPARAMETRIC MIXTURE OF REGRESSIONS 11 Table 3.1: RASEs: Mean and Standard Deviation RASE m RASE σ 2 RASE π n h Mean(Std) Mean(Std) Mean(Std) (0.0964) (0.0918) (0.0393) (0.0682) (0.0706) (0.0325) (0.0992) (0.1431) (0.0237) (0.0723) (0.0622) ) (0.0659) (0.0517) (0.0276) (0.0683) (0.0552) (0.0190) (0.0562) (0.0465) (0.0312) (0.0440) (0.0476) (0.0218) (0.0368) (0.0395) (0.0149) Table 3.2: Standard error of the unknown mean functions m 1(x) m 2(x) n x SD SE(Std) SD SE(Std) (0.0371) (0.0315) (h=0.08) (0.0468) (0.0338) (0.0247) (0.0222) (h=0.06) (0.0290) (0.0256) (0.0190) (0.0164) (h=0.04) (0.0210) (0.0154) It is well known that the EM algorithm may be trapped by a local maximizer. Thus, the EM algorithm may be sensitive to the initial value. To obtain a good initial value, we first fit a mixture of polynomial regression models, and obtain the estimates of mean functions m c (x), and parameters σ c 2, π c. Then we set the initial values m (1) c (x) = m c (x), σ 2(1) (x) = σ c 2, and π (1) (x) = π c. To demonstrate that the proposed procedure works quite well for a wide range of bandwidths, we use the following strategy to determine the bandwidth rather than use the crossvalidation bandwidth selector for each generated sample in our simulation study to save computing time. In our simulation, we first generate several simulation data sets for a given sample size, and then use the CV bandwidth selectors to choose a bandwidth for each data set. This provides us an idea about the c

13 12 MIAN HUANG AND RUNZE LI 8 7 true mean functions 6 Estimated mean functions, true mean functions and confident intervals estimated mean functions true mean functions 95% conficent intervals y 3 y x (a) x (b) Figure 3.2: (a) A typical sample of simulated data (n=400), and the plot of true mean functions; (b) The estimated mean functions (solid lines), true mean functions (dash lines), and 95% pointwise confidence intervals (dotted lines) with n = 400, h = optimal bandwidth for a given sample size. Then we consider three different bandwidths: half of the selected bandwidth, the selected bandwidth, and twice of the selected bandwidth, which corresponds to the under-smoothing, appropriate smoothing and over-smoothing, respectively. Table 3.1 displays the mean and standard deviation of RASEs over 500 simulations. From Table 3.1, the proposed procedure performs quite well for a wide range of bandwidths, We next test the accuracy of the standard error formulas. Table 3.2 summarizes the simulation results for the unknown functions m c (x) at points 0.30 and The standard deviation of 500 estimates, denoted by SD, can be viewed as the true standard errors. We then calculate the sample average and standard deviation of the 500 estimated standard errors using the proposed standard error formulas (3.1), denoted by SE(Std). The result shows that although underestimation is presented, the proposed sandwich formula works reasonably well because the difference between the true value and the estimate is less than twice of the standard error of the estimate. Now we illustrate the performance of the proposed procedure by using a typical simulated sample with n = 400, which is selected to be the one with the median of RASE m in the 500 simulation samples. Figure 3.2(a) shows the

14 NONPARAMETRIC MIXTURE OF REGRESSIONS 13 scatter plot of the typical sample data and the true mean functions. Before we conduct analysis using nonparametric mixture of regression models for this data set, we first briefly discuss how to determine the number of components C. Note that conditioning on X = x, the nonparametric mixture of regression models (2.1) is indeed an univariate mixture of normals. For a local area of x, m c (x) may be consider as a constant, thus to choose C for nonparametric mixture of regression models, we suggest an analysis based on modeling the response valuable y of the local data in univariate mixture of normals. An illustration is given in Figure 3.2(a). We select the local area with relative large variance in y direction, which is shown in the rectangle. Traditional methods can be used to determine C. We fit the local data using univariate mixture of normals with one, two, three, and four components. The BIC for each model is 83.29, 64.19, 68.43, and Thus, a two component model is selected. We use the crossvalidation (CV) criterion proposed in Section 3.2 to select a bandwidth. The CV bandwidth selector yields the bandwidth The resulting estimate with n = 400 along with its pointwise confidence interval is depicted in Figure 3.2(b), from which we can see that the true mean functions lies within the confidence interval. This implies that the proposed estimation procedure performs quite well with the moderate sample size An Application We illustrate the proposed model and estimation procedure by an analysis of a real dataset, which contains the monthly change of SP-Case Shiller House Price Index (HPI) and monthly growth rate of United States Gross Domestic Product (GDP) from Jan, 1990 to Dec, It is known that in the literature of economic research, HPI is a measure of a nation s housing cost and GDP is a measure of the size of nation s economy. It is of interested to investigate the impact of GDP growth rate on HPI change. As expected, the impact of GDP growth rate on HPI may have different pattern in different macroeconomic cycles. In this illustration, we set HPI change to be the response variable, and the GDP growth rate to be predictor. The scatterplot of this data set is depicted in Figure 3.3(a). We first determine the number of component C using local data within the rectangle of Figure 3.3(a). The BIC for univariate mixture of normals models

15 14 MIAN HUANG AND RUNZE LI with one, two and three components are 13.46, 2.34 and 2.57, respectively. Thus, a two component model is selected. We next select the bandwidth by a 5-fold CV selector described in (3.3). The selected optimal bandwidth is With this optimal bandwidth we fit the data in a 2-component nonparametric mixture of regression models. The estimated mean functions are shown in Figure 3.3(b). We further depicted the hard-clustering result, which is obtained by assigning component identities according to the largest r ic, c = 1,, C. The triangular points from the lower cluster are mainly from January 1990 to September The circle points in the upper cluster are mainly from October 1997 to December 2002, during which the economy experienced an internet boom and the following bust. From the result we observed that in the upper component, HPI change tend to be lower when GDP growth is modest. The estimated mixing proportion function is plotted in Figure 3.3(c), and the estimated variance functions are plotted in Figure 3.3(d). We compare the proposed nonparametric finite mixture of regression models to two other models: a local linear regression model and 2-component mixture of linear regression models. The optimal bandwidth for local linear regression was selected by a plug-in method (Ruppert et al., 1995), and the mixture of regression models will be estimated by an EM algorithm. We randomly select 5% of the data to be a test set and the rest as a training set. Estimations for 3 different models are obtained from training set and prediction errors are calculated based on the deviations in test set. Our result shows that in 100 simulations, the average prediction error of nonparametric mixture of regression models, mixture of linear regression models, and local linear regression are , , and , respectively. The result implies that the proposed model has 7.26% reduction in prediction error compared to mixture of linear regression models, and has 57.43% reduction in prediction error compared to local linear regression. The improved prediction error demonstrates the advantages of nonparametric finite mixture of regression model in the analysis of US house price index data. 4. Discussion In this paper, we proposed a class of nonparametric finite mixture of regression models, which allows the mixing proportions, the mean functions and the variance functions all are nonparametric functions of the covariate. We proposed

16 NONPARAMETRIC MIXTURE OF REGRESSIONS 15 Scatter Plot Estimation and hard clustering HPI Change 0.4 HPI Change GDP Growth GDP Growth Rate (a) (b) Estimated Proportion Function Proportion function of upper component 0.03 Estimated Variance Functions upper component lower component Proportion Function Variance Functions GDP Growth Rate GDP Growth Rate (c) (d) Figure 3.3: (a) Scatterplot of US house price index data; (b)estimated mean functions with 95% confident interval and hard-clustering result; (c)estimated mixing proportion function; (d)estimated variance functions.

17 16 MIAN HUANG AND RUNZE LI an estimation procedure for the newly proposed model by kernel regression, and further studied the sampling properties of the resulting estimate. We derived its asymptotic bias and asymptotic variance, and further established its asymptotic normality. We proposed an effective EM algorithm to calculate the local maximum likelihood estimate. We also studied the ascent property of the proposed algorithm. We founded that for large sample, the effective EM algorithm still possesses the monotone ascent property. The performance of the proposed procedures was empirically examined by a Monte Carlo simulation and illustrated by an analysis of a real data example. In this paper, we focus on estimation of the newly proposed nonparametric finite mixture of regression models when x 0 is an interior point in the range of covariate. It is certainly of interest to study the boundary effect of the proposed procedure. The boundary effect has been studied in Cheng, Fan and Marron (1997) for the nonparametric regression model. It is of interest to conduct testing hypothesis whether the mixing proportions are constants, whether some of mean functions are constant or of a specific parametric forms and whether the variance functions are parametric. These can be for further researches. 5. Proofs In this section we outline the key steps for proofs of Theorems 1 and 2. Note that θ = (π T, σ 2T, m T ) T is a (3C 1) 1 vector. Whenever necessary, we rewrite θ = (θ 1,, θ 3C 1 ) T without changing the order of π, σ 2, and m. Otherwise, we will use the same notations as we defined in section 2. Regularity Conditions A. The sample {(x i, y i ), i = 1,, n} is independent and identically distributed from its population (x, y). The support for x, denoted by X, is closed and bounded of R 1. The density η{y θ(x 0 )} is identifiable up to a permutation of the mixture components. B. The marginal density function f(x) is continuous first derivative and positive for x X. C. There exists a function M(x, y), with E{M(X, Y )} <, such that 3 l(θ, y)/ θ j θ k θ l M(x, y), for all x, y, and all θ in a neighborhood of θ(x 0 ). D. The unknown functions θ(x) has continuous second derivative.

18 NONPARAMETRIC MIXTURE OF REGRESSIONS 17 E. The kernel function K( ) has a bounded support, and satisfies that K(t)dt = 1, tk(t)dt = 0, t 2 K(t)dt = κ 2 > 0. F. The following conditions hold for all i and j. ( l(θ(x 0 ), Y ) 3) [ { E 2 } 2 ] l(θ(x 0 ), Y ) <, E <. θ j θ i θ j All these conditions are mild conditions and have been used in the literature of local likelihood estimation and mixture models. used in the proof of Theorem 1. The following lemma will be Lemma 1 Under Conditions A, C, and F, for x 0 in the interior of X, it holds that E[q 1 {θ(x), Y } X = x 0 ] = 0, (5.1) E[q 2 {θ(x), Y } X = x 0 ] = E[q 1 {θ(x), Y }q T 1 {θ(x), Y } X = x 0 ]. (5.2) Proof. Conditioning X = x 0, Y follows a finite mixtures of normals. Thus, (5.1) holds by some calculations. Furthermore, (5.2) follows by using under regularity conditions C, F and the arguments in Page 39 of McLachlan and Peel (2000) together. This completes the proof of this lemma. We refer (5.1 and (5.2) to as the local Barlette s first and second identities, respectively. (5.1) implies that Λ(x 0 ) = 0. Proof of Theorem 1. Denote m c = nh{m c0 m c (x 0 )}, σ 2 c = nh{σ 2 c0 σ 2 c (x 0 )}, π c = nh{π c0 π c (x 0 )}, πc = nh{π C0 π C (x 0 )} = C 1 nh[1 {π c0 π c (x 0 )}]. Let m = (m 1,, m C )T, σ 2 = (σ1 2, σ2 C )T, and π = (π1,, π C 1 )T. Denote θ = (π T, σ 2 T, m T ) T. Recall that { C } l(θ(x 0 ), y) = log η{y θ(x 0 )} = log π c (x 0 )φ{y m c (x 0 ), σc 2 (x 0 )}. c=1 c=1

19 18 MIAN HUANG AND RUNZE LI Let { C l(θ(x 0 ) + γ n θ, y) = log (π c (x 0 ) + γ n πc ) c=1 φ ( y m c (x 0 ) + γ n m c, σc 2 (x 0 ) + γ n σc 2 ) }. Thus, if {ˆπ, ˆσ 2, ˆm} maximizes (2.2), then ˆθ maximizes l n(θ ) = h {l(θ(x 0 ) + γ n θ, y i ) l(θ(x 0 ), y i )}K h (x i x 0 ). (5.3) By the Taylor expansion, l n(θ ) = n θ θ T Γ n θ + hγ3 n 6 R(θ(x 0 ), ˆξ), (5.4) where ˆξ is a vector between 0 and γ n θ, and h n = q 1 {θ(x 0 ), y i }K h (x i x 0 ), n Γ n = 1 q 2 {θ(x 0 ), y i }K h (x i x 0 ), n R(θ(x 0 ), ˆξ) = j,k,l 3 l(θ(x 0 ) + ˆξ, y i ) θ j θ k θ l K h (x i x 0 ). Denote Γ n (i, j) the (i, j) element of Γ n. By condition E, it can be shown that 2 l(θ(x 0 ), y) EΓ n (i, j) = η{y θ(x)}f(x)k h (x x 0 )dxdy Y X θ i θ j 2 l(θ(x 0 ), y) = f X (x 0 ) η{y θ(x 0 )}dy + o p (1). θ i θ j Y Therefore, EΓ n = f X (x 0 )I(x 0 ) + o p (1). Var{Γ n (i, j)} is dominated by the following term 1 n Y X { 2 l(θ(x 0 ), y) θ i θ j } 2 η{y θ(x)}f(x)k 2 h (x x 0)dxdy, which can be shown to have the order O p {(nh) 1 } under condition F. Therefore, we have Γ n = f X (x 0 )I(x 0 ) + o p (1).

20 NONPARAMETRIC MIXTURE OF REGRESSIONS 19 By Condition C, the expectation of the absolute value of the last term of (5.4) is bounded by ( O γ n E max j,k,l 3 l(θ(x 0 ) + ˆξ, ) Y ) K h (x i x 0 ) = O(γ n ). (5.5) θ j θ k θ l Thus, the last term of (5.4) is of order O p (γ n ). Therefore, we have l n(θ ) = n θ 1 2 f(x 0)θ T I(x 0 )θ + o p (1). (5.6) Using the quadratic approximation lemma (for example, see p.210 of Fan and Gijbels, 1996), we have ˆθ = f(x 0 ) 1 I(x 0 ) 1 n + o p (1). (5.7) To establish asymptotic normality, it remains to calculate the mean and variance of n, and verify the Lyapounov condition. Note that E( n ) = nh q 1 {θ(x 0 ), y}η{y θ(x)}f(x)k h (x x 0 )dxdy Y X = nh Λ(x)f(x)K h (x x 0 )dx. X Under Conditions C, D and F, Λ(x) has continuous second derivative. Thus, using the fact Λ(x 0 ) = 0 by Lemma 1 and standard arguments in the kernel regression, it follows that { f (x 0 )Λ (x 0 ) E( n ) = f(x 0 ) For the covariance term of n, we have + 1 } 2 Λ (x 0 ) κ 2 h 2 + o(h 2 ). Cov( n ) = he { q 1 {θ(x 0 ), Y }q T 1 {θ(x 0 ), Y }K 2 h (X x 0) } + o p (1), where its (i, j) element is l(θ(x 0 ), y) l(θ(x 0 ), y) h Kh 2 Y X θ i θ (x x 0)f(x)η{y θ(x)}dxdy j P l(θ(x 0 ), y) l(θ(x 0 ), y) f(x 0 )ν 0 η{y θ(x 0 )}dy Y θ i θ j 2 l(θ(x 0 ), y) = f(x 0 )ν 0 η{y θ(x 0 )}dy. θ i θ j Y

21 20 MIAN HUANG AND RUNZE LI The last step holds due to (5.2). Thus, Cov( n ) = f(x 0 )I(x 0 )ν 0 + o p (1). In order to establish asymptotic normality for n, it is necessary to show for any unit vector d, {d T Cov( n )d} 1/2 d T { n E( n )} D N(0, 1). (5.8) Since Cov( n ) = O p (1), it follows that {d T Cov( n )d} = O p (1). Let λ i = d T q 1 {θ(x 0 ), y i }K h (x i x 0 ), then d T n = hγ n n λ i. Then, it is sufficient to show that nh 3 γ 3 ne( λ i 3 ) = o p (1). By condition F and arguments similar to (5.5), it can be shown that nh 3 γ 3 ne( λ i 3 ) = O p (γ n ) = o p (1), and thus Lyapounov s condition holds for (5.8). By (5.7) and the Slutsky theorem, we have nh{γnˆθ B(x0 ) + o p (h 2 )} D N { 0, f 1 (x 0 )ν 0 I 1 (x 0 ) }. (5.9) Proof of Theorem 2 We assume the unobserved data (C i, i = 1,, n) are random samples from population C, and the complete data {(x i, y i, C i ), i = 1, 2,, n} are random samples from (X, Y, C). Let h{y, C θ(x)} be the joint pdf of (Y, C) given X = x, and f(x) be the marginal density of X. Conditioning on X = x, Y follows a distribution η{y θ(x)}. Then the conditional density of C is given by g{c y, θ(x)} = h{y, c θ(x)} η{y θ(x)}. (5.10) The local log-likelihood function (2.2) can be re-written as l n (θ 0 ) = log{η(y i θ 0 )}K h (x i x 0 ). (5.11) Given a fixed ˆθ (l) (x i ), i = 1,, n, it is obvious that g{c y i, ˆθ (l) (x i )}dc = 1. Then local likelihood (5.11) is { l n (θ) = log{η(y i θ)} = { By (5.10), we also have } g{c y i, ˆθ (l) (x i )}dc K h (x i x 0 ) } log{η(y i θ)}g{c y i, ˆθ (l) (x i )}dc K h (x i x 0 ). (5.12) log{η(y i θ)} = log{h(y i, c θ)} log{g(c y i, θ)}. (5.13)

22 NONPARAMETRIC MIXTURE OF REGRESSIONS 21 Thus, we have l n (θ) = E{log{h(y i, C θ)} θ (l) (x i )}K h (x i x 0 ) E{log{g(C y i, θ)} θ (l) (x i )}K h (x i x 0 ), (5.14) where θ (l) (x i ) is the l-th step local estimations at x i. Taking the expectation is equivalent to calculating (2.3). In M step, we choose θ (l+1) (x 0 ) such that 1 n 1 n { } E log[h{y i, C θ (l+1) (x 0 )}] θ (l) (x i ) K h (x i x 0 ) { } E log[h{y i, C θ (l) (x 0 )}] θ (l) (x i ) K h (x i x 0 ). It suffices to show that [ { } 1 ] g{c y i, θ (l+1) (x 0 )} lim inf E log n n g{c y i, θ (l) (x 0 )} θ(l) (x i ) K h (x i x 0 ) 0 (5.15) in probability. Denote [ { } L g = 1 ] g{c y i, θ (l+1) (x 0 )} E log n g{c y i, θ (l) (x 0 )} θ(l) (x i ) K h (x i x 0 ), and L J = 1 n [ { g{c y i, θ (l+1) }] (x 0 )} log E g{c y i, θ (l) (x 0 )} θ(l) (x i ) K h (x i x 0 ). By Jensen s inequality, L g L J. Next we show that L J 0 in probability. To this end, we first calculate the expectation of L J. ( [ ] ) g{c Y, θ (l+1) (x 0 )} E(L J ) = E log g{c Y, θ (l) (x 0 )} g{c Y, θ(l) (X)}dc K h (X x 0 ), C which tends to 0 by a standard argument. We next calculate the variance of L J. Note that the variance of L J is dominated by the following term ( [ ] 2 1 n E g{c Y, θ (l+1) (x 0 )} log g{c Y, θ (l) (x 0 )} g{c Y, θ(l) (X)}dc K h (X x 0 )), C

23 22 MIAN HUANG AND RUNZE LI which can be shown to have the order O p {(nh) 1 }. Then we have L J = o p (1) by Chebyshev inequality. This completes the proof. Acknowledgments Huang s research is supported by National Science Foundation grants DMS and National Institute on Drug Abuse grants P50 DA10075 as research assistant during his graduate study at the Pennsylvania State University. Li s research is supported by National Institute on Drug Abuse grants R21 DA The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIDA or the NIH. References Cheng, M. Y., Fan, J. and Marron, J. S. (1997). On Automatic Boundary Corrections. Annals of Statistics. 25, Dempster, A. P., Laird, N. M. and Rubin, B. D. (1977). Maximum Likelihood from Incomplete Data Via EM Algorithm. Journal of American and Statisical Association. 39, Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. Chapmam and Hall, London. Fan, J. (1993). Local Linear Regression Smoothers and Their Minimax efficiencies. Annals of Statistics. 21, Fan, J. and Gijbels, I. (1992). Variable Bandwidth and Local Linear Regression Smoothers. Annals of Statistics. 20, Fan, J. and Yao, Q. (1998). Efficient Estimation of Conditional Variance Functions in Stochastic Regression. Biometrika. 85, Frühwirth-Schnatter, S. (2001). Markov Chain Monte Carlo Estimation of Classical and Dynamic Switching and Mixture Models. Journal of American and Statisical Association. 96, Frühwirth-Schnatter, S. (2005). Finite Mixture and Markov Switching Models. Springer.

24 NONPARAMETRIC MIXTURE OF REGRESSIONS 23 Green, P. J. and Richardson, S. (2002). Hidden Markov Models and Disease Mapping. Journal of American and Statisical Association. 97, Goldfeld, S. M. and Quandt, R. E. (1976). A Markov Model for Swiching Regression. Journal of Econometrics. 1, Hathaway, R. J. (1985). A Constrained Formulation of Maximum-likelihood Estimation for Normal Mixture Distributions. Annals of Statistics. 13, Hennig, C. (2000). Identifiability of Models for Clusterwise Linear Regression. Journal of Classification. 17, Lindsay, B. G. (1995). Mixture Models: Theory, Geometry, and Applications. Hayward, Institute of Mathematical Statistics. McLachlan, G. J. and Peel, D. (2000). Finite Mixture Models. Wiley, New York. Ruppert, D., Sheather, S. J. and Wand, M. P. (1995). An Effective Bandwidth Selector for Local Least Squares Regression. Journal of American and Statisical Association. 90, Rossi, P. E., Allenby, G. M., and McCulloch, R. (2005). Bayesian Statistics and Marketing. Chichester: Wiley. Wang, P., Puterman, M. L., Cockburn, I., and Le, N. (1996). Mixed Poisson Regression Models with Covariate Dependent Rates. Biometrics. 52, Wedel, M. and DeSarbo, W. S. (1993). A Latent Class Binomial Logit Methodology for the Analysis of Paired Comparison Data. Decision Sciences. 24, Department of Statistics, Shanghai University of Finance and Economics, Shanghai, China, ; mzh136@stat.psu.edu Department of Statistics and The Methodology Center, The Pennsylvania State University, University Park, Pennsylvania, USA, ; rli@stat.psu.edu

The EM Algorithm for the Finite Mixture of Exponential Distribution Models

The EM Algorithm for the Finite Mixture of Exponential Distribution Models Int. J. Contemp. Math. Sciences, Vol. 9, 2014, no. 2, 57-64 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ijcms.2014.312133 The EM Algorithm for the Finite Mixture of Exponential Distribution

More information

Mixture of Regression Models with Varying Mixing Proportions: A Semiparametric Approach

Mixture of Regression Models with Varying Mixing Proportions: A Semiparametric Approach Mixture of Regression Models with Varying Mixing Proportions: A Semiparametric Approach Mian Huang and Weixin Yao Abstract In this paper, we study a class of semiparametric mixtures of regression models,

More information

Function of Longitudinal Data

Function of Longitudinal Data New Local Estimation Procedure for Nonparametric Regression Function of Longitudinal Data Weixin Yao and Runze Li Abstract This paper develops a new estimation of nonparametric regression functions for

More information

New Local Estimation Procedure for Nonparametric Regression Function of Longitudinal Data

New Local Estimation Procedure for Nonparametric Regression Function of Longitudinal Data ew Local Estimation Procedure for onparametric Regression Function of Longitudinal Data Weixin Yao and Runze Li The Pennsylvania State University Technical Report Series #0-03 College of Health and Human

More information

Nonparametric Modal Regression

Nonparametric Modal Regression Nonparametric Modal Regression Summary In this article, we propose a new nonparametric modal regression model, which aims to estimate the mode of the conditional density of Y given predictors X. The nonparametric

More information

Local Modal Regression

Local Modal Regression Local Modal Regression Weixin Yao Department of Statistics, Kansas State University, Manhattan, Kansas 66506, U.S.A. wxyao@ksu.edu Bruce G. Lindsay and Runze Li Department of Statistics, The Pennsylvania

More information

Mixture of Gaussian Processes and its Applications

Mixture of Gaussian Processes and its Applications Mixture of Gaussian Processes and its Applications Mian Huang, Runze Li, Hansheng Wang, and Weixin Yao The Pennsylvania State University Technical Report Series #10-102 College of Health and Human Development

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model 1. Introduction Varying-coefficient partially linear model (Zhang, Lee, and Song, 2002; Xia, Zhang, and Tong, 2004;

More information

Label Switching and Its Simple Solutions for Frequentist Mixture Models

Label Switching and Its Simple Solutions for Frequentist Mixture Models Label Switching and Its Simple Solutions for Frequentist Mixture Models Weixin Yao Department of Statistics, Kansas State University, Manhattan, Kansas 66506, U.S.A. wxyao@ksu.edu Abstract The label switching

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Web-based Supplementary Material for. Dependence Calibration in Conditional Copulas: A Nonparametric Approach

Web-based Supplementary Material for. Dependence Calibration in Conditional Copulas: A Nonparametric Approach 1 Web-based Supplementary Material for Dependence Calibration in Conditional Copulas: A Nonparametric Approach Elif F. Acar, Radu V. Craiu, and Fang Yao Web Appendix A: Technical Details The score and

More information

Local linear multiple regression with variable. bandwidth in the presence of heteroscedasticity

Local linear multiple regression with variable. bandwidth in the presence of heteroscedasticity Local linear multiple regression with variable bandwidth in the presence of heteroscedasticity Azhong Ye 1 Rob J Hyndman 2 Zinai Li 3 23 January 2006 Abstract: We present local linear estimator with variable

More information

University, Tempe, Arizona, USA b Department of Mathematics and Statistics, University of New. Mexico, Albuquerque, New Mexico, USA

University, Tempe, Arizona, USA b Department of Mathematics and Statistics, University of New. Mexico, Albuquerque, New Mexico, USA This article was downloaded by: [University of New Mexico] On: 27 September 2012, At: 22:13 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered

More information

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department

More information

Econometrics I, Estimation

Econometrics I, Estimation Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the

More information

Estimating Mixture of Gaussian Processes by Kernel Smoothing

Estimating Mixture of Gaussian Processes by Kernel Smoothing This is the author s final, peer-reviewed manuscript as accepted for publication. The publisher-formatted version may be available through the publisher s web site or your institution s library. Estimating

More information

Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures

Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures David Hunter Pennsylvania State University, USA Joint work with: Tom Hettmansperger, Hoben Thomas, Didier Chauveau, Pierre Vandekerkhove,

More information

Local Polynomial Regression

Local Polynomial Regression VI Local Polynomial Regression (1) Global polynomial regression We observe random pairs (X 1, Y 1 ),, (X n, Y n ) where (X 1, Y 1 ),, (X n, Y n ) iid (X, Y ). We want to estimate m(x) = E(Y X = x) based

More information

Time Series and Forecasting Lecture 4 NonLinear Time Series

Time Series and Forecasting Lecture 4 NonLinear Time Series Time Series and Forecasting Lecture 4 NonLinear Time Series Bruce E. Hansen Summer School in Economics and Econometrics University of Crete July 23-27, 2012 Bruce Hansen (University of Wisconsin) Foundations

More information

Outline. Clustering. Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models

Outline. Clustering. Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models Collaboration with Rudolf Winter-Ebmer, Department of Economics, Johannes Kepler University

More information

DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA

DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA Statistica Sinica 18(2008), 515-534 DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA Kani Chen 1, Jianqing Fan 2 and Zhezhen Jin 3 1 Hong Kong University of Science and Technology,

More information

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.

More information

SEMIPARAMETRIC ESTIMATION OF CONDITIONAL HETEROSCEDASTICITY VIA SINGLE-INDEX MODELING

SEMIPARAMETRIC ESTIMATION OF CONDITIONAL HETEROSCEDASTICITY VIA SINGLE-INDEX MODELING Statistica Sinica 3 (013), 135-155 doi:http://dx.doi.org/10.5705/ss.01.075 SEMIPARAMERIC ESIMAION OF CONDIIONAL HEEROSCEDASICIY VIA SINGLE-INDEX MODELING Liping Zhu, Yuexiao Dong and Runze Li Shanghai

More information

Modelling Non-linear and Non-stationary Time Series

Modelling Non-linear and Non-stationary Time Series Modelling Non-linear and Non-stationary Time Series Chapter 2: Non-parametric methods Henrik Madsen Advanced Time Series Analysis September 206 Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September

More information

A Bootstrap Test for Conditional Symmetry

A Bootstrap Test for Conditional Symmetry ANNALS OF ECONOMICS AND FINANCE 6, 51 61 005) A Bootstrap Test for Conditional Symmetry Liangjun Su Guanghua School of Management, Peking University E-mail: lsu@gsm.pku.edu.cn and Sainan Jin Guanghua School

More information

Nonparametric Regression Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction

Nonparametric Regression Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction Tine Buch-Kromann Univariate Kernel Regression The relationship between two variables, X and Y where m(

More information

Extending clustered point process-based rainfall models to a non-stationary climate

Extending clustered point process-based rainfall models to a non-stationary climate Extending clustered point process-based rainfall models to a non-stationary climate Jo Kaczmarska 1, 2 Valerie Isham 2 Paul Northrop 2 1 Risk Management Solutions 2 Department of Statistical Science, University

More information

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas 0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity

More information

SEMIPARAMETRIC MIXTURE OF BINOMIAL REGRESSION WITH A DEGENERATE COMPONENT

SEMIPARAMETRIC MIXTURE OF BINOMIAL REGRESSION WITH A DEGENERATE COMPONENT Statistica Sinica 22 (2012), 27-46 doi:http://dx.doi.org/10.5705/ss.2010.062 SEMIPARAMETRIC MIXTURE OF BINOMIAL REGRESSION WITH A DEGENERATE COMPONENT J. Cao and W. Yao Simon Fraser University and Kansas

More information

Communications in Statistics - Simulation and Computation. Comparison of EM and SEM Algorithms in Poisson Regression Models: a simulation study

Communications in Statistics - Simulation and Computation. Comparison of EM and SEM Algorithms in Poisson Regression Models: a simulation study Comparison of EM and SEM Algorithms in Poisson Regression Models: a simulation study Journal: Manuscript ID: LSSP-00-0.R Manuscript Type: Original Paper Date Submitted by the Author: -May-0 Complete List

More information

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model.

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model. Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model By Michael Levine Purdue University Technical Report #14-03 Department of

More information

Nonparametric Methods

Nonparametric Methods Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis

More information

Computation of an efficient and robust estimator in a semiparametric mixture model

Computation of an efficient and robust estimator in a semiparametric mixture model Journal of Statistical Computation and Simulation ISSN: 0094-9655 (Print) 1563-5163 (Online) Journal homepage: http://www.tandfonline.com/loi/gscs20 Computation of an efficient and robust estimator in

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Variance Function Estimation in Multivariate Nonparametric Regression

Variance Function Estimation in Multivariate Nonparametric Regression Variance Function Estimation in Multivariate Nonparametric Regression T. Tony Cai 1, Michael Levine Lie Wang 1 Abstract Variance function estimation in multivariate nonparametric regression is considered

More information

Local Polynomial Modelling and Its Applications

Local Polynomial Modelling and Its Applications Local Polynomial Modelling and Its Applications J. Fan Department of Statistics University of North Carolina Chapel Hill, USA and I. Gijbels Institute of Statistics Catholic University oflouvain Louvain-la-Neuve,

More information

EM Algorithm II. September 11, 2018

EM Algorithm II. September 11, 2018 EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information

Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION. September 2017

Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION. September 2017 Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION By Degui Li, Peter C. B. Phillips, and Jiti Gao September 017 COWLES FOUNDATION DISCUSSION PAPER NO.

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

Preface. 1 Nonparametric Density Estimation and Testing. 1.1 Introduction. 1.2 Univariate Density Estimation

Preface. 1 Nonparametric Density Estimation and Testing. 1.1 Introduction. 1.2 Univariate Density Estimation Preface Nonparametric econometrics has become one of the most important sub-fields in modern econometrics. The primary goal of this lecture note is to introduce various nonparametric and semiparametric

More information

SEMIPARAMETRIC QUANTILE REGRESSION WITH HIGH-DIMENSIONAL COVARIATES. Liping Zhu, Mian Huang, & Runze Li. The Pennsylvania State University

SEMIPARAMETRIC QUANTILE REGRESSION WITH HIGH-DIMENSIONAL COVARIATES. Liping Zhu, Mian Huang, & Runze Li. The Pennsylvania State University SEMIPARAMETRIC QUANTILE REGRESSION WITH HIGH-DIMENSIONAL COVARIATES Liping Zhu, Mian Huang, & Runze Li The Pennsylvania State University Technical Report Series #10-104 College of Health and Human Development

More information

Optimal Estimation of a Nonsmooth Functional

Optimal Estimation of a Nonsmooth Functional Optimal Estimation of a Nonsmooth Functional T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania http://stat.wharton.upenn.edu/ tcai Joint work with Mark Low 1 Question Suppose

More information

SEMIPARAMETRIC MIXTURE OF BINOMIAL REGRESSION WITH A DEGENERATE COMPONENT. J. Cao and W. Yao. Simon Fraser University and Kansas State University

SEMIPARAMETRIC MIXTURE OF BINOMIAL REGRESSION WITH A DEGENERATE COMPONENT. J. Cao and W. Yao. Simon Fraser University and Kansas State University SEMIPARAMETRIC MIXTURE OF BINOMIAL REGRESSION WITH A DEGENERATE COMPONENT J. Cao and W. Yao Simon Fraser University and Kansas State University Abstract: Many datasets contain a large number of zeros,

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Economics 583: Econometric Theory I A Primer on Asymptotics

Economics 583: Econometric Theory I A Primer on Asymptotics Economics 583: Econometric Theory I A Primer on Asymptotics Eric Zivot January 14, 2013 The two main concepts in asymptotic theory that we will use are Consistency Asymptotic Normality Intuition consistency:

More information

LOCAL LINEAR REGRESSION FOR GENERALIZED LINEAR MODELS WITH MISSING DATA

LOCAL LINEAR REGRESSION FOR GENERALIZED LINEAR MODELS WITH MISSING DATA The Annals of Statistics 1998, Vol. 26, No. 3, 1028 1050 LOCAL LINEAR REGRESSION FOR GENERALIZED LINEAR MODELS WITH MISSING DATA By C. Y. Wang, 1 Suojin Wang, 2 Roberto G. Gutierrez and R. J. Carroll 3

More information

Statistics & Data Sciences: First Year Prelim Exam May 2018

Statistics & Data Sciences: First Year Prelim Exam May 2018 Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book

More information

On Mixture Regression Shrinkage and Selection via the MR-LASSO

On Mixture Regression Shrinkage and Selection via the MR-LASSO On Mixture Regression Shrinage and Selection via the MR-LASSO Ronghua Luo, Hansheng Wang, and Chih-Ling Tsai Guanghua School of Management, Peing University & Graduate School of Management, University

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of

More information

Review and continuation from last week Properties of MLEs

Review and continuation from last week Properties of MLEs Review and continuation from last week Properties of MLEs As we have mentioned, MLEs have a nice intuitive property, and as we have seen, they have a certain equivariance property. We will see later that

More information

Independent and conditionally independent counterfactual distributions

Independent and conditionally independent counterfactual distributions Independent and conditionally independent counterfactual distributions Marcin Wolski European Investment Bank M.Wolski@eib.org Society for Nonlinear Dynamics and Econometrics Tokyo March 19, 2018 Views

More information

Nonparametric Regression

Nonparametric Regression Nonparametric Regression Econ 674 Purdue University April 8, 2009 Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 1 / 31 Consider the univariate nonparametric regression model: where y

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Statistica Sinica 19 (2009), 71-81 SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Song Xi Chen 1,2 and Chiu Min Wong 3 1 Iowa State University, 2 Peking University and

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Simple and Efficient Improvements of Multivariate Local Linear Regression

Simple and Efficient Improvements of Multivariate Local Linear Regression Journal of Multivariate Analysis Simple and Efficient Improvements of Multivariate Local Linear Regression Ming-Yen Cheng 1 and Liang Peng Abstract This paper studies improvements of multivariate local

More information

Estimating the parameters of hidden binomial trials by the EM algorithm

Estimating the parameters of hidden binomial trials by the EM algorithm Hacettepe Journal of Mathematics and Statistics Volume 43 (5) (2014), 885 890 Estimating the parameters of hidden binomial trials by the EM algorithm Degang Zhu Received 02 : 09 : 2013 : Accepted 02 :

More information

Lecture Notes 15 Prediction Chapters 13, 22, 20.4.

Lecture Notes 15 Prediction Chapters 13, 22, 20.4. Lecture Notes 15 Prediction Chapters 13, 22, 20.4. 1 Introduction Prediction is covered in detail in 36-707, 36-701, 36-715, 10/36-702. Here, we will just give an introduction. We observe training data

More information

Nonparametric Density Estimation

Nonparametric Density Estimation Nonparametric Density Estimation Econ 690 Purdue University Justin L. Tobias (Purdue) Nonparametric Density Estimation 1 / 29 Density Estimation Suppose that you had some data, say on wages, and you wanted

More information

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Vadim Marmer University of British Columbia Artyom Shneyerov CIRANO, CIREQ, and Concordia University August 30, 2010 Abstract

More information

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/

More information

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we

More information

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

12 - Nonparametric Density Estimation

12 - Nonparametric Density Estimation ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6

More information

Local linear multivariate. regression with variable. bandwidth in the presence of. heteroscedasticity

Local linear multivariate. regression with variable. bandwidth in the presence of. heteroscedasticity Model ISSN 1440-771X Department of Econometrics and Business Statistics http://www.buseco.monash.edu.au/depts/ebs/pubs/wpapers/ Local linear multivariate regression with variable bandwidth in the presence

More information

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Yingying Dong and Arthur Lewbel California State University Fullerton and Boston College July 2010 Abstract

More information

On a Nonparametric Notion of Residual and its Applications

On a Nonparametric Notion of Residual and its Applications On a Nonparametric Notion of Residual and its Applications Bodhisattva Sen and Gábor Székely arxiv:1409.3886v1 [stat.me] 12 Sep 2014 Columbia University and National Science Foundation September 16, 2014

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Estimation of cumulative distribution function with spline functions

Estimation of cumulative distribution function with spline functions INTERNATIONAL JOURNAL OF ECONOMICS AND STATISTICS Volume 5, 017 Estimation of cumulative distribution function with functions Akhlitdin Nizamitdinov, Aladdin Shamilov Abstract The estimation of the cumulative

More information

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics Jiti Gao Department of Statistics School of Mathematics and Statistics The University of Western Australia Crawley

More information

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,

More information

Finite Population Sampling and Inference

Finite Population Sampling and Inference Finite Population Sampling and Inference A Prediction Approach RICHARD VALLIANT ALAN H. DORFMAN RICHARD M. ROYALL A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane

More information

Introduction An approximated EM algorithm Simulation studies Discussion

Introduction An approximated EM algorithm Simulation studies Discussion 1 / 33 An Approximated Expectation-Maximization Algorithm for Analysis of Data with Missing Values Gong Tang Department of Biostatistics, GSPH University of Pittsburgh NISS Workshop on Nonignorable Nonresponse

More information

Statistical Estimation

Statistical Estimation Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from

More information

An Extended BIC for Model Selection

An Extended BIC for Model Selection An Extended BIC for Model Selection at the JSM meeting 2007 - Salt Lake City Surajit Ray Boston University (Dept of Mathematics and Statistics) Joint work with James Berger, Duke University; Susie Bayarri,

More information

Estimation, Inference, and Hypothesis Testing

Estimation, Inference, and Hypothesis Testing Chapter 2 Estimation, Inference, and Hypothesis Testing Note: The primary reference for these notes is Ch. 7 and 8 of Casella & Berger 2. This text may be challenging if new to this topic and Ch. 7 of

More information

A Simple Solution to Bayesian Mixture Labeling

A Simple Solution to Bayesian Mixture Labeling Communications in Statistics - Simulation and Computation ISSN: 0361-0918 (Print) 1532-4141 (Online) Journal homepage: http://www.tandfonline.com/loi/lssp20 A Simple Solution to Bayesian Mixture Labeling

More information

Smooth simultaneous confidence bands for cumulative distribution functions

Smooth simultaneous confidence bands for cumulative distribution functions Journal of Nonparametric Statistics, 2013 Vol. 25, No. 2, 395 407, http://dx.doi.org/10.1080/10485252.2012.759219 Smooth simultaneous confidence bands for cumulative distribution functions Jiangyan Wang

More information

Stat 451 Lecture Notes Numerical Integration

Stat 451 Lecture Notes Numerical Integration Stat 451 Lecture Notes 03 12 Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2 Updated: February 11, 2016 1 / 29

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Introduction. Linear Regression. coefficient estimates for the wage equation: E(Y X) = X 1 β X d β d = X β

Introduction. Linear Regression. coefficient estimates for the wage equation: E(Y X) = X 1 β X d β d = X β Introduction - Introduction -2 Introduction Linear Regression E(Y X) = X β +...+X d β d = X β Example: Wage equation Y = log wages, X = schooling (measured in years), labor market experience (measured

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Han Liu John Lafferty Larry Wasserman Statistics Department Computer Science Department Machine Learning Department Carnegie Mellon

More information

Econ 582 Nonparametric Regression

Econ 582 Nonparametric Regression Econ 582 Nonparametric Regression Eric Zivot May 28, 2013 Nonparametric Regression Sofarwehaveonlyconsideredlinearregressionmodels = x 0 β + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β The assume

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

J. Cao and W. Yao. Simon Fraser University and Kansas State University

J. Cao and W. Yao. Simon Fraser University and Kansas State University SEMIPARAMETRIC MIXTURE OF BINOMIAL REGRESSION WITH A DEGENERATE COMPONENT J. Cao and W. Yao Simon Fraser University and Kansas State University Abstract: Many historical datasets contain a large number

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X. Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may

More information

Test for Discontinuities in Nonparametric Regression

Test for Discontinuities in Nonparametric Regression Communications of the Korean Statistical Society Vol. 15, No. 5, 2008, pp. 709 717 Test for Discontinuities in Nonparametric Regression Dongryeon Park 1) Abstract The difference of two one-sided kernel

More information

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US Gerdie Everaert 1, Lorenzo Pozzi 2, and Ruben Schoonackers 3 1 Ghent University & SHERPPA 2 Erasmus

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain 0.1. INTRODUCTION 1 0.1 Introduction R. A. Fisher, a pioneer in the development of mathematical statistics, introduced a measure of the amount of information contained in an observaton from f(x θ). Fisher

More information

BAYESIAN DECISION THEORY

BAYESIAN DECISION THEORY Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will

More information

Obtaining Critical Values for Test of Markov Regime Switching

Obtaining Critical Values for Test of Markov Regime Switching University of California, Santa Barbara From the SelectedWorks of Douglas G. Steigerwald November 1, 01 Obtaining Critical Values for Test of Markov Regime Switching Douglas G Steigerwald, University of

More information

Model-based cluster analysis: a Defence. Gilles Celeux Inria Futurs

Model-based cluster analysis: a Defence. Gilles Celeux Inria Futurs Model-based cluster analysis: a Defence Gilles Celeux Inria Futurs Model-based cluster analysis Model-based clustering (MBC) consists of assuming that the data come from a source with several subpopulations.

More information

Testing Statistical Hypotheses

Testing Statistical Hypotheses E.L. Lehmann Joseph P. Romano Testing Statistical Hypotheses Third Edition 4y Springer Preface vii I Small-Sample Theory 1 1 The General Decision Problem 3 1.1 Statistical Inference and Statistical Decisions

More information