A New Regression Model: Modal Linear Regression
|
|
- Antony Lester
- 6 years ago
- Views:
Transcription
1 A New Regression Model: Modal Linear Regression Weixin Yao and Longhai Li Abstract The population mode has been considered as an important feature of data and it is traditionally estimated based on some nonparametric kernel density estimator. However, little research has been done to apply the mode concept if certain regression structure is imposed. This article develops a new data analysis tool, called modal linear regression, to explore the high-dimension data. The modal linear regression assumes the mode of the conditional density of Y given predictors x is a linear function of x. It is distinguished from the conventional linear regression in that it measures the center using the most likely conditional values rather than the conditional average. We will argue that the modal regression, besides its robustness, gives a more meaningful point prediction and larger coverage probability for prediction than the mean regression when the error density is skewed if the same length of short intervals, centered around each estimate, are used. We propose an EM type algorithm for the proposed modal liner regression and provide the systematic asymptotic properties under the general error density assumption, such as skewed error density. Application to simulated data and real data demonstrates that the proposed modal regression has better predictive performance than the mean regression. 1 Weixin Yao is Assistant Professor, Department of Statistics, Kansas State University, Manhattan, Kansas 66506, U.S.A. wxyao@ksu.edu. Longhai Li is Assistant Professor, Department of Mathematics and Statistics, University of Saskatchewan, Saskatoon, Saskatchewan, S7N5E6, Canada. longhai@math.usask.ca. The research of Longhai Li is supported by fundings from Natural Sciences and Engineering Research Council of Canada, and Canadian Foundation of Innovations. 1
2 Key Words: Forest fires data; Linear regression; Modal regression; Mode. 1. Introduction The mode of a distribution is regarded as an important feature of data. Several authors have made efforts to identify the modes of population distributions for low-dimensional data. See, for example, Muller and Sawitzki (1991); Scott (1992); Friedman and Fisher (1999); Chaudhuri and Marron (1999); Fisher and Marron (2001); Davies and Kovac (2004); Hall, Minnotte, and Zhang (2004); Ray and Lindsay (2005); Yao and Lindsay (2009). (Please see the command npconmode in R package np for the nonparametric mode estimation.) In high-dimensional data, it is common to impose some model structure assumption such as assumption on conditional distributions. Thus, it is of great interest to study the mode hunting for conditional distributions. Given a random sample {(x i, y i ), i = 1,..., n}, where x i is a p-dimension column vector, suppose f(y x) is the conditional density function of Y given x. For the conventional regression models, the mean of f(y x) is usually used to investigate the relationship between Y and x and the linear regression assumes that the mean of f(y x) is a linear function of x. In this article, we propose a new regression model called modal linear regression that assumes the mode of f(y x) is a linear function of the predictor x. Modal linear regression measures the center using the most likely conditional values rather than the conditional average used by the traditional linear regression. Compared to the traditional mean regression, our proposed modal regression has the following main advantages. 1. When the error density is highly skewed, the modal regression provides a more meaningful location estimator/point prediction than the mean regression. 2. Since the modal regression focuses on the highest conditional density region, a short interval around the modal regression estimate is expected to have larger coverage prob- 2
3 ability when doing prediction than the same length of interval around the mean regression estimate. 3. The modal regression can be used to estimate the relationship between variables for majority of the data when a small proportion of data are contaminated or don t follow the same relationship as the majority data. 4. Due to the robustness of the mode, the modal linear regression is also robust to the outliers and the heavy-tailed error density. 5. The modal regression has several advantages over the traditional robust regression. The traditional robust regression still targets the mean regression function and requires the assumption of symmetric error density; it doesn t have a good model interpretation if the error density is not symmetric. However, the modal regression does not require the error density to be symmetric and has the nice modal explanation for general error density. Therefore, the modal regression provides us another powerful tool to explore the high-dimension data and it can be easily extended to some other regression models, such as nonlinear regression, nonparametric regression, and varying coefficient partially linear regression. Quantile regression is another alternative data analysis tool when the data is skewed. However, quantile regression cannot reveal the modal information and might produce the low density point prediction. Therefore, the modal regression is a potentially very useful addition to the current data analysis tool. We propose a kernel based objective function to estimate the modal regression line. We further develop a modal EM algorithm to maximize the proposed objective function. The convergence rate and sampling properties of the proposed estimation procedure are systematically studied. A Monte Carlo simulation study is also conducted to assess the finite sample performance of the proposed procedure. The 3
4 application of the proposed modal regression to predict forest fires using meteorological data shows that the proposed method has better predictive performance. We propose a method for the proposed modal regression to use the information of error density to construct the skewed prediction interval. The leave-one-out cross validation shows that modal regression provides much shorter prediction intervals than the mean regression for forest fires data. The rest of this article is organized as follows. In Section 2, we introduce our new modal linear regression, develop the modal EM algorithm, and study the asymptotic properties of the resulting estimate. In Section 3, a Monte Carlo simulation study is conducted, and a real data application is used to demonstrate the proposed methodology. Some discussions are given in Section 4. Proofs of consistency of our estimators and the necessary technical conditions are given in the Appendix. 2. Modal linear regression 2.1. Introduction to modal linear regression Suppose a response variable y given a set of predictor x is distributed with a PDF f(y x). In this article, we propose to use the mode of f(y x), denoted by Mode(Y x) = arg max y (f(y x)), to investigate the relationship between Y and x. The proposed modal linear regression method assumes that Mode(Y x) = x T β. (2.1) We will discuss the extension of modal linear regression in (2.1) to some other models in Section 4. To include the intercept term in (2.1), we assume that the first element of x is 1. Let ɛ = y x T β and denote by g(ɛ x) the conditional density of ɛ given x. Here, we allow the conditional density of ɛ given x to depend on x. Based on the model assumption (2.1), one knows that g(ɛ x) is maximized at 0 for any x. If g(ɛ x) is symmetric about 0, 4
5 the β in (2.1) will be the same as the conventional linear regression parameters. However, if g(ɛ x) is skewed, they will be different and it is even possible that the modal regression is a linear function of x but the conventional mean regression function is nonlinear. Let us use an example to illustrate the difference between modal regression function and conventional mean regression function when the error is skewed. Example 1: Let (x, Y ) satisfy the following model assumption Y = m(x) + σ(x)ε, (2.2) where ε has density h( ). Suppose h( ) is a skewed density with mean 0 and mode 1. a) If m(x) = x T β and σ(x) = x T α, then E(Y x) = x T β and Mode(Y x) = x T (β + α), where Mode(Y x) means the conditional mode of Y given x. Therefore, Y depends on x linearly from the point of view of both mean regression and modal regression, although their regression parameters are different. b) If m(x) = 0 and σ(x) = x T α, then E(Y x) = 0 and Mode(Y x) = x T α. Therefore, from the point of view of conventional mean regression, Y does not depend on x, but based on modal regression, Y does depend on x through the modal line. From this example, we can see that the variable selection techniques based on modal regression might reveal some new useful predictors compared to the mean regression. 5
6 We propose to estimate the modal regression parameter β in (2.1) by maximizing Q h (β) 1 n φ h (y i x T i β), (2.3) where φ h (t) = h 1 φ(t/h) and φ(t) is a kernel density function. Denote by ˆβ the maximizer of (2.3). We call ˆβ the modal linear regression (MODLR) estimator. For ease of computation, we use the standard normal density function for φ(t) throughout this paper. See (2.6) below. To see why (2.3) can be used to estimate the modal regression, taking β = β 0, the intercept term only, then (2.3) is simplified to Q h (β 0 ) 1 n φ h (y i β 0 ). (2.4) As a function of β 0, Q h (.) is the kernel estimate of the density function of y. Therefore, the maximizer of (2.4) is the mode of the kernel density function based on y 1,... y n. When n, h 0, the mode of kernel density function will converge to the mode of the distribution of y. Such modal estimator has been proposed by Parzen (1962). Here, we use this kernel type objective function to estimate the modal regression parameter β. We will prove, in Section 2.3, that the ˆβ found by maximizing (2.3) is a consistent estimate of modal regression parameter in (2.1) under very general error density, such as skewed error density, if h 0 when n. Note that for any fixed β, Q h (β) in (2.3) is the value of kernel density function of the residuals ɛ i = y i x i β at point 0. Therefore, by maximizing (2.3) with respect to β, we will find a line xˆβ such that the value of the kernel density function of residuals at 0 is maximized. If φ h (t) = (2h) 1 I( t h), a uniform kernel, is used, for φ( ), then (2.3) tries to find the line x T ˆβ such that the band x T ˆβ ± h consists of the largest number of response values y i. Therefore, the modal regression gives more meaningful prediction than the mean 6
7 regression when the data is skewed. Lee (1989) used such uniform kernel to estimate the modal regression. However, in Lee (1989), h is fixed and does not depend on the sample size n. Therefore, they require the error to be symmetric to get the consistent modal line (note that in such case the modal line is the same as the traditional mean regression line). Thus, their modal regression estimator is essentially a robust regression estimate under the assumption of symmetric error density Modal expectation-maximization algorithm Note that (2.3) does not have an explicit solution. In this section, we propose to extend the modal expectation-maximization (MEM) algorithm, proposed by Li, Ray, and Lindsay (2007), to maximize (2.3). Similar to an EM algorithm, the MEM algorithm consists of two steps: E-step and M-step. Let β (0) be the initial value. Starting with k = 0, repeat the following two steps until it converges: E-Step: In this step, we calculate weights π(j β (k) ), j = 1,..., n as π(j β (k) ) = φ h (y j x T j β (k) ) n φ h(y i x T i β(k) ) φ h(y j x T j β (k) ). (2.5) M-Step: In this step, we update β (k+1) β (k+1) = arg max β j=1 { } π(j β (k) ) log φ h (y j x T j β) = (X T W k X) 1 X T W k y, (2.6) where X = (x 1,..., x n ) T, W k is an n n diagonal matrix with diagonal elements π(j β (k) )s, and y = (y 1,..., y n ) T. Note that the function optimized in M-step is just a weighted sum of log likelihoods of ordinary linear regression, therefore the explicit maximizer is possible. The following 7
8 theorem establishes the ascending property of the proposed MEM algorithm and its proof is given in the Appendix. Theorem 2.1. Each iteration of (2.5) and (2.6) will monotonically non-decrease the objective function (2.3), i.e., Q h (β (k+1) ) Q h (β (k) ), for all k. Note that the converged value may depend on the starting point as usual EM algorithms, and there is no guarantee that the algorithm will converge to the global optimal solution of (2.3). Therefore, it is prudent to run the algorithm from several starting-points and choose the best local optima found. From the MEM algorithm, it can be seen that the major difference between the mean regression estimate by the least squares (LSE) and the modal regression estimate (MODLR) lies in the E step. For the LSE, each observation has equal weight. On the other hand, for MODLR, the weight depends on how close y i is to the modal regression line. This weight scheme allows MODLR to reduce the effect of observations far away from the modal regression line to achieve robustness. The iteratively reweighted least squares (IRWLS) algorithm has been commonly used for general M-estimators. Since our modal regression can be considered as a special case of M- estimators, IRWLS algorithm can be also applied to our proposed modal regression. When normal kernel is used for φ( ), the IRWLS algorithm is actually equivalent to the proposed MEM algorithm, but they are different if φ( ) is not normal. Note that IRWLS has been proved to have monotone and convergence property if φ(x)/x is nonincreasing. But the proposed MEM algorithm has monotone property for any kernel density φ( ). In addition, note that φ(x)/x is not nonincreasing if φ(x) is a normal density function. Therefore, the proposed MEM algorithm provides better explanation why the IRWLS is monotone for normal kernel density Theoretical properties 8
9 Note that the asymptotic properties established for traditional M-estimator require the error density to be symmetric and the objective function to be fixed, since traditional M-estimators still focus on mean regression. For our proposed modal regression (2.3), the tuning parameter h in the objective function goes to zero and the error density can be skewed. Therefore, the theoretical results on the traditional M estimators can not be directly applied to modal regression. In this section, we prove the consistency of our proposed modal regression estimator ˆβ for model (2.1) and provide its convergence rate as well as its asymptotic distribution. Their proofs are given in the Appendix. Theorem 2.2. When h 0 and nh 5, under the regularity conditions (A1) (A3) in the Appendix, there exists a consistent maximizer of (2.3) such that ˆβ β 0 = O p {h 2 + (nh 3 ) 1/2 }, where β 0 is the true coefficient of the modal regression line defined in (2.1). Theorem 2.3. Under the same assumptions as Theorem 2.2, the ˆβ, given in Theorem 2.2, has the following asymptotic normality result nh 3 ] [ˆβ β 0 h2 2 J 1 D K{1 + o p (1)} N { 0, ν 2 J 1 LJ 1}, (2.7) where ν 2 = t 2 φ 2 (t)dt and J = E{g (0 x i )x i x T i }; K = E{g (0 x i )x i }; L = E{g(0 x i )x i x T i }. Paren (1962) and Eddy (1980) have proved similar asymptotic results for kernel estimators of the mode of the distribution of y without considering the regression effect. Therefore, 9
10 the results of Paren (1962) and Eddy (1980) can be considered as a special case of Theorem 2.3 when there is no predictor involved, i.e., x = 1. From Theorem 2.3, we can see that the asymptotic bias of ˆβ is h 2 J 1 K/2 and the asymptotic variance is ν 2 J 1 LJ 1 /(nh 3 ). A theoretic optimal bandwidth h for estimating β can be obtained by minimizing the asymptotic weighted mean squared errors (MSE) { } E (ˆβ β 0 ) T W (ˆβ β 0 ) K T J 1 W J 1 Kh 4 /4 + (nh 3 ) 1 tr ( J 1 LJ 1 W ), where W is a weight function, such as an identity matrix, reflecting which coefficient is more important in inference, and tr(a) is the trace of A. Therefore, the asymptotic optimal bandwidth h is If W = (J 1 LJ 1 ) 1 variance of ˆβ, then ĥ opt = [ ] 3ν2 tr (J 1 LJ 1 1/7 W ) n 1/7. (2.8) K T J 1 W J 1 K = JL 1 J, which is proportional to the inverse of the asymptotic ĥ opt = [ ] 1/7 3ν2 (p + 1) n 1/7. (2.9) K T L 1 K Let β = (β 0, β s ) T, where β 0 is a scalar intercept parameter and β s is the slope parameter. If ɛ is independent of x, then J 1 K = (1, 0,..., 0) T g (0)/{2g (0)}, and thus the asymptotic bias of the slope parameter β s is 0. Therefore, the optimal bandwidth h for estimating β s should go to infinity, which implies that the resulting estimate ˆβ s is a least square estimate with root n consistency. This is expected since when ɛ is independent of x, the slope parameter β s of modal regression line is the same as the slope parameter of conventional mean regression line and thus can be estimated at root n convergence rate. 10
11 Given the root n consistent estimate ˆβ s (using LSE, for example), we propose to further estimate β 0 by 1 ˆβ 0 = arg max β 0 n φ h (y i x T i ˆβ s β 0 ). (2.10) The above maximization can be done similarly using the MEM algorithm proposed in Section 2.2. Denote by ˆβ 0 the maximizer of (2.10), we have the following result. Its proof is given in the Appendix. Theorem 2.4. Under the same assumption as Theorem 2.2, if ɛ is independent of x and g (0) 0, the ˆβ 0 defined in (2.10) has the following asymptotic distribution: { } { nh 3 ˆβ 0 β 0 g (0)h 2 2g (0) + o p(h 2 D ) N 0, g(0)ν } 2 [g (0)] 2. (2.11) Note that when ɛ is independent of x, J 1 LJ 1 = g (0) 2 g(0)e(xx T ) 1. Let A = E(xx T ) = 1 A 12 A T 12 A 22 and a 11 be the (1,1) element of A 1. Noting that a 11 = (1 A 12 A 1 22 A T 12) 1 and A 22 is positive definite, we have a Therefore, based on Theorem 2.3 and 2.4, we can see that using the root n consistent estimate ˆβ s as initial, we can get more efficient estimate of the intercept β 0 than the one found by maximizing (2.3) directly. This is reasonable since the estimate ˆβ 0 in (2.10) need not account for the uncertainty of ˆβ s due to its root n consistency and thus ˆβ 0 is asymptotically as efficient as if β s were known. From Theorem 2.4, we can see that the asymptotic bias of ˆβ 0 is {2g (0)} 1 g (0)h 2 and its asymptotic variance is [{g (0)} 2 nh 3 ] 1 g(0)ν 2. By minimizing the asymptotic MSE, we 11
12 can get the asymptotic optimal bandwidth h for estimating β 0 : [ ] 1/7 3g(0)ν2 ĥ opt = n 1/7. (2.12) {g (0)} Finite sample breakdown point To better investigate how robust the MODLR estimator is, we also calculate its finite sample breakdown point. Breakdown point is used to quantify the proportion of bad data in a sample that an estimator can tolerate before returning arbitrary values. Since usually the breakdown point is most useful in a small sample setup (Donoho, 1982; Donoho and Huber, 1983), we will mainly focus on the finite sample breakdown point. A number of definitions for the finite sample breakdown point have been proposed (see, for example, Hampel, 1971, 1974; Donoho, 1982; Donoho and Huber, 1983). In this paper, we shall work with the finite sample contamination breakdown point. Let z i = (x i, y i ). Given the sample Z = (z 1,..., z n ), denote T (Z) the MODLR estimate of β, as defined in (2.3). We can corrupt the original sample Z by adding m arbitrary points Z = (z n+1,..., z n+m ). The corrupted sample Z Z then has sample size n + m and contains a fraction δ = m/(m + n) of bad values. The finite sample contamination breakdown point δ is defined as { } m δ (Z, T ) = min 1 m n n + m : sup T (Z Z ) =, (2.13) Z where is Euclidean norm. Theorem 2.5. Given the observations Z = (z 1,..., z n ), suppose T (Z) = ˆβ is the MODLR estimate defined in (2.3). Let M = 2πh φ h (y i x T ˆβ). i (2.14) 12
13 Then the finite sample contamination breakdown point of MODLR is δ (Z, T ) = m n + m, (2.15) where m is an integer satisfying M m M +1, a is the largest integer not greater than a, and a is the smallest integer not less than a. The proof of Theorem 2.5 is given in the Appendix. From the above theorem, we can see that the breakdown point depends not only on φ( ), and the tuning parameter h, but also on the sample configuration. (However, Huber (1984) pointed out if the scale (contained in the bandwidth h of the MODLR estimate) is determined from the sample itself, empirically, the breakdown point is quite high.) 3. Simulation Study and Application In this section, we will conduct a Monte Carlo simulation to assess the performance of our proposed MODLR estimator for finite sample size and compare it with some other regression methods. A real data application is also provided. To use the modal regression estimator, we need to first select the bandwidth. Note that the asymptotic optimal bandwidth formula (2.9) contains the unknown quantities g (v) (0 x), v = 0, 2, 3, the vth derivative of conditional density of ɛ given x. Hence, they are not ready to use. One of the commonly used methods is to replace the unknown quantities by some estimates. Given the initial residual ˆɛ i = y i x T ˆβ, i where ˆβ is the traditional least squares estimate (or a robust estimate if there are some outliers) of β, we can estimate their mode, denoted by ˆm, by maximizing the kernel density estimator (Paren, 1962). Under the assumption of independence of ɛ and x, ˆɛ i ˆm approximately has density g( ) and thus 13
14 g (v) (0 x) can be estimated by (See, for example, Silverman, 1986 and Scott, 1992) ĝ (v) (0 x) = 1 nh v+1 { } K (v) ˆɛi ˆm, v = 0, 2, 3, h where K (v) ( ) is the vth derivative of kernel density function K( ). Then, we can estimate J, K, and L by Ĵ = n 1 ĝ (0 x i )x i x T i, ˆK = n 1 ĝ (0 x i )x i, and ˆL = n 1 ĝ(0 x i )x i x T i, and apply the formula (2.9) to estimate ĥopt. To refine the bandwidth selection, one might further iteratively update the chosen bandwidth by recalculating the residual ˆɛ i based on the modal linear regression estimate A Monte Carlo simulation Example 1: Generate iid sample {(x i, y i ), i = 1,..., n} from the following model Y = 1 + 3X + σ(x)ε, where X U(0, 1), ε 0.5N( 1, ) + 0.5N(1, ), and σ(x) = 1 + 2X. Note that E(ε) = 0, Mode(ε) = 1, and Median(ε) = 0.67 (the last two quantities are approximate). The traditional mean regression function is E(Y X) = 1 + 3X. The proposed modal regression function is Mode(Y X) = 2 + 5X, with modal regression residual Y Mode(Y X) = (1 + 2X)(ε 1), whose distribution peaks at 0, but is negatively skewed. The median regression is Median(Y X) = X. We consider and compare the following four methods: 1) traditional mean regression based on least squares estimate (LSE); 2) median regression (MEDREG); 3) MM-estimate (M-estimate with a initial robust M-estimate of scale, Yohai, 1987) based on Tukey s biweight ψ-function; 4) the proposed modal linear regression (MODLR). 14
15 Figure 1 is the scatter plot of a typical generated sample with n = 200 and four regression curves. From the plot, we can see that the modal line goes through the area containing the most number of points. A small prediction band around this line is expected to contain the most number of future points. In contrast, the mean regression line based on LSE is skewed to a flatter line and lies in a much lower dense area for capturing the conditional mean. The regression lines based on median regression and MM-estimate lie in higher density areas than the regression line based on LSE. Table 1 reports the average and standard error (Std) of the parameter estimates for each method based on 1,000 replicates. From the table, we can see that LSE, MEDREG, and MODLR can estimate their target parameters well. However, MM-estimate can not estimate the mean regression well since the assumption of symmetric error density is violated. Surprisingly, the MODLR has smaller standard error than other methods in this example when the error is skewed, especially when n = 200 or n = 400. Therefore, for finite sample, the MODLR estimator not only has a good modal explanation but also might have better estimation accuracy than other methods when the error is skewed. In addition, MEDREG and MM-estimate also have smaller standard errors than LSE, due to their robustness. To see why the modal linear regression has better predictive performance than the traditional mean regression, in Table 2, we report the average and standard error of the estimated coverage probabilities when doing prediction based on the same length of intervals centered around each estimate. We consider the lengths of intervals, centered around each estimate, being 0.1σ, 0.2σ, and 0.5σ, where σ = 2 is the approximate standard error of ε. For each of 1,000 replications, we estimate the coverage probability by doing prediction for the 1,000 equally spaced predictors from 0.1 to 0.9. From Table 2, we can see that MODLR provides higher coverage probabilities than all other three methods. Therefore, when the errors are skewed, a small prediction band around the modal line can have larger coverage probabilities than the one around the mean regression line or median regression line with the same length. 15
16 In addition, MEDREG provides larger coverage probabilities than MM-estimate and LSE and MM-estimate provides larger coverage probabilities than LSE. Note that when the length of intervals is large enough, different methods will provide similar coverage probabilities Observation LSE MEDREG MM estimate MODLR 5 y x Figure 1: Scatter plot of a typical sample with n = 200 for Example 1 with different estimated regression lines:. denotes the mean regression line based on LSE; denotes the median regression line; denotes the regression line based on MM-estimate; denotes the modal regression line. 16
17 Table 1: Average (Std) of parameter estimates over 1,000 repetitions. Method Parameter n=50 n=100 n=200 n=400 LSE β 0 = (0.964) 0.989(0.659) 1.007(0.490) 1.009(0.322) β 1 = (2.260) 3.063(1.500) 2.977(1.160) 2.976(0.733) MEDREG β 0 = (0.707) 1.613(0.422) 1.636(0.301) 1.667(0.188) β 1 = (1.670) 4.372(0.981) 4.339(0.705) 4.312(0.457) MM-estimate β 0 = (0.782) 1.040(0.530) 1.022(0.376) 1.035(0.265) β 1 = (1.640) 5.234(1.060) 5.271(0.744) 5.271(0.512) MODLR β 0 = (0.670) 1.841(0.372) 1.875(0.229) 1.912(0.140) β 1 = (1.750) 5.024(0.948) 5.044(0.574) 5.020(0.387) 3.2. The forest fires data The occurrence of forest fires, also called wildfires, is one of the major concerns that affect forest preservation, and create ecological and economical damage. Fast detection is vital for successful firefighting. Since traditional human surveillance and the automatic solutions by satellite-based or infrared/smoke scanners are expensive, it receives much attention to use the low costs meteorological data to warn the public of a fire danger and to support fire management decision. In this section, we illustrate the proposed methodology by an analysis of forest fire data (Cortez and Morais, 2007). This data is available at: pcortez/forestfires/. This forest fire data contains 517 observations and was collected from January 2000 to December 2003 from the Montesinho natural park of the Trás-os-Montes northeast region of Portugal. At a daily basis, every time a forest fire occurred, many features are recorded, such as time, date, spatial location, and weather conditions. Following Cortez and Morais (2007), we will use the four meteorological variables: outside temperature (temp), outside relative humidity (RH), outside wind speed (wind), and outside rain (rain), to predict the total burned area (area). We fit the data by LSE, MEDREG, MM-estimate, and MODLR. One important feature of this data set is that it contains outliers and positive skewed re- 17
18 Table 2: Average (Std) of coverage probabilities over 1,000 repetitions with σ = 2. Width Method n=50 n=100 n=200 n= σ LSE 0.034(0.015) 0.032(0.011) 0.030(0.009) 0.029(0.007) MEDREG 0.073(0.018) 0.077(0.014) 0.078(0.012) 0.080(0.010) MM-estimate 0.065(0.023) 0.067(0.019) 0.066(0.015) 0.067(0.012) MODLR 0.087(0.016) 0.092(0.012) 0.095(0.010) 0.095(0.009) 0.2σ LSE 0.069(0.028) 0.065(0.022) 0.061(0.015) 0.059(0.013) MEDREG 0.144(0.033) 0.153(0.024) 0.155(0.019) 0.158(0.015) MM-estimate 0.129(0.042) 0.133(0.034) 0.132(0.027) 0.134(0.021) MODLR 0.170(0.027) 0.179(0.018) 0.184(0.013) 0.186(0.012) 0.5σ LSE 0.186(0.062) 0.181(0.047) 0.174(0.035) 0.171(0.028) MEDREG 0.338(0.061) 0.355(0.040) 0.360(0.029) 0.365(0.022) MM-estimate 0.313(0.080) 0.322(0.062) 0.325(0.046) 0.330(0.036) MODLR 0.378(0.049) 0.395(0.029) 0.404(0.018) 0.407(0.015) sponse variable (area). Therefore, it is expected that the traditional mean regression based on LSE won t work well. To see why MODLR has better prediction properties than other methods, we compare the widths of the prediction intervals with the same confidence level. Suppose we have obtained the parameter estimate ˆβ and the corresponding residuals ˆɛ 1,..., ˆɛ n from fitting the data; we will use ˆɛ [i] to denote the ith smallest value of the residuals. The traditional method for constructing a prediction interval for new predictor x new is (x T ˆβ new ˆɛ [n1 ], x T ˆβ new + ˆɛ [n2 ]), where n 1 = nα/2, and n 2 = n n 1. This symmetric method will be ideal if the regression error is symmetric. Since MODLR focuses on the highest conditional density region, we propose a method for MODLR to use the information of skewed error density to construct the prediction intervals with (1 α)100% confidence level. Suppose ˆf( ) is the kernel density estimate of ɛ based on the residuals ˆɛ 1,..., ˆɛ n that are estimated by MODLR. We propose to find the indexes k 1 < k 2 such that k 2 k 1 = n 2 n 1 = n(1 α) and ˆf(ˆɛ [k1 ]) ˆf(ˆɛ [k2 ]). Then the proposed prediction interval for new predictor x new is (x T ˆβ new ˆɛ [k1 ], x T ˆβ new + ˆɛ [k2 ]). 18
19 Table 3: Average widths (percentage of coverage) of the prediction intervals Methods Nominal confidence levels 10% 30% 50% 90% LSE 2.166(0.101) 6.687(0.294) 12.70(0.493) 53.03(0.896) MEDREG 0.975(0.091) 2.638(0.292) 6.506(0.491) 48.52(0.894) MM-estimate 1.144(0.099) 2.910(0.294) 6.499(0.497) 48.49(0.906) MODLR 0.012(0.112) 0.035(0.311) 0.571(0.499) 26.44(0.899) We propose the following iterative algorithm to find indexes k 1 and k 2. Let k 1 = n 1 and k 2 = n 2 be the initial values for k 1 and k 2. Step 1: If ˆf(ˆɛ[k1 ]) < ˆf(ˆɛ [k2 ]) and ˆf(ˆɛ [k1 +1]) < ˆf(ˆɛ [k2 +1]), k 1 = k and k 2 = k 2 + 1; if ˆf(ˆɛ [k1 ]) > ˆf(ˆɛ [k2 ]) and ˆf(ˆɛ [k1 1]) > ˆf(ˆɛ [k2 1]), k 1 = k 1 1 and k 2 = k 2 1. Step 2: Iterate the above procedure until none of above two conditions is satisfied or (k 1 1)(k 2 n) = 0. In Table 3, we report the average widths and the actual coverage rates of the prediction intervals for 10%, 30%, 50%, and 90% confidence levels. The actual coverage rates are estimated based on leave-one-out cross validation. From Table 3, we have the following findings: 1. All the prediction intervals are well-calibrated the actual coverage rates are very close to the nominal confidence levels. 2. The average widths of prediction intervals constructed around MODLR are much shorter than the prediction intervals constructed around other three estimates. 3. Both MEDREG and MM-estimate have shorter prediction intervals than LSE. 19
20 4. Summary and Discussions In this article, we have proposed a new powerful data analysis tool, called modal regression, to explore the relationship between a response variable and a set of covariates. The modal regression investigates the relationship using the most likely conditional values instead of the conditional mean used by traditional regression technique. When the error is skewed, the modal regression provides a more meaningful prediction than the LSE. Our simulation results demonstrated that the modal regression has larger coverage probability when doing prediction than some other commonly used regression methods if the same length of short intervals, centered around each estimate, are used. Further research can be conducted to find how to construct the shortest (skewed) prediction interval for a given confidence level using the information of skewed error density. One related work is Kim and Lindsay (2011). They proposed to use confidence distribution sampling to visualize confidence sets. In the application of forest fire data application, we provided one possible way to construct the skewed prediction interval for MODLR. Based on the cross validation results, the proposed skewed prediction intervals for MODLR were much shorter than the prediction intervals constructed by some of other commonly used regression methods for forest fire data. The modal linear regression assumes that the mode of the conditional density of Y given x is a linear function of x. The idea of modal linear regression can be easily generalized to other models such as nonlinear regression, nonparametric regression, and varying coefficient partially linear regression. In addition, it will be also interesting to see how to select the informative variables based on the modal regression idea. These will be our future research work. Appendix The following technical conditions are imposed in this section. 20
21 (A1) g (v) (t x), v = 0, 1, 2, 3 is continuous in a neighborhood of 0, and g (0 x) = 0 for any x. (A2) n 1 n g (0 x i )x i x T i = J + o p (1), n 1 n g (0 x i )x i = K + o p (1), and n 1 n g(0 x i)x i x T i = L + o p (1), where J < 0, i.e., J is a positive definite matrix. (A3) n 1 n x i 4 = O p (1). The conditions above are mild and are fulfilled in many applications. Proof of Theorem 2.1: Note that { } { } log{q h (β (k+1) )} log{q h (β (k) )} = log φ h (y i x T i β (k+1) ) log φ h (y i x T i β (k) ) = log = log = log [ [ [ Based on the Jensen s inequality, we have ] φ h (y i x T i β (k+1) ) n φ h(y i x T i β(k) ) φ h (y i x T i β (k) ) n φ h(y i x T i β(k) ) π(i β (k) ) φ h(y i x T i β (k+1) ) φ h (y i x T i β(k) ) ] φ h (y i x T i β (k+1) ) φ h (y i x T i β(k) ) ] log{q h (β (k+1) )} log{q h (β (k) )} { } π(i β (k) φ h (y i x T i β (k+1) ) ) log. φ h (y i x T i β(k) ) Based on the property of M step in (2.6), we have log{q h (β (k+1) )} log{q h (β (k) )} 0 and thus Q h (β (k+1) ) Q h (β (k) ). 21
22 Proof of Theorem 2.2: Note that φ h(t) = h 3 ( t2 h 2 1)φ(t/h) and φ h(t) = t h 3 φ(t/h). Let a n = (nh 3 ) 1/2 + h 2. It is sufficient to show that for any given η > 0, there exists a large constant c such that P { sup Q h (β 0 + a n µ) < Q h (β 0 )} 1 η. µ =c (A.1) Let X = (x 1,..., x n ) T and y = (y 1,..., y n ) T. Denote K n Q h(β 0 ) β J n 2 Q h (β 0 ) ββ T = 1 n = 1 n φ h(y i x T i β 0 )x i where Q h (β) is defined in (2.3) and β 0 is the true parameter value. By directly calculating the mean and variance, we can get (A.2) φ h(y i x T i β 0 )x i x T i, (A.3) E(J n x) = 1 n g (0 x i )x i x T i {1 + o p (1)} = J{1 + o p (1)}, Var(J n x) = O p {(nh 5 ) 1 }, E(K n x) = h2 g (0 x i )x i (1 + o p (1)) = h2 2n 2 K{1 + o p(1)}, Cov(K n x) = 1 n 2 h ν 3 2 g(0 x i )x i x T i {1 + o(1)} = 1 nh ν 2L{1 + o 3 p (1)}, (A.4) where J = lim n 1 n g (0 x i )x i x T i, K = lim n 1 n g (0 x i )x i, and L = lim n 1 n g(0 x i )x i x T i. By default, when calculating the variance of a matrix, we find the variance of each element of the matrix. Using the result X = E(X) + O p ({Var(X)} 1/2 ), since nh 5, 22
23 J n = J + o p (1). Notice that Q h (β 0 + a n µ) Q h (β 0 ) = a n K T n µ + a2 n 2 µt J n µ a3 n 6nh 4 φ ( y i x T i β )(x T i µ) 3 h = M 1 + M 2 + M 3, (A.5) where u = c and β β 0 ca n. From (A.4), we get K n = O p (a n ) and hence M 1 = O p (a 2 n). Note that M 2 = 0.5a 2 nµ T Jµ{1 + o p (1)}. Based on the boundness of φ (4) (t) and β β 0 ca n, we have φ ( y i x T i β h ) = φ ( y i x T i β 0 )(1 + o p (1)). h Noting that φ (t) = (3t t 3 )φ(t), based on the Taylor expansion and the symmetric property of φ(t), we have that { ( )} { ( )} E φ Yi x T i β 0 = O p (h 4 ), Var φ Yi x T i β 0 = O p (h). (A.6) h h Since nh 5, we can prove that M 3 = o p (a 2 n). For any η > 0, we can choose c big enough, such that the second term M 2 dominates the other two terms in (A.5) with probability 1 η. Since J < 0, Q h (β 0 + a n µ) Q h (β 0 ) < 0 with probability 1 η. The result of Theorem 2.2 follows. Proof of Theorem 2.3: Suppose ˆβ is the consistent solution to Q h (β)/ β found in Theorem 2.2. Based on the Taylor expansion, we have 0 = Q h(ˆβ) β = K n + (J n + L n )(ˆβ β 0 ), (A.7) 23
24 where L n = 1 2nh 4 [ φ ( Y i β T x i ) h ] } {(ˆβ β 0 ) T x i x i x T i, where β β 0 ˆβ β 0. Based on the result of (A.7), we have ˆβ β 0 = (J n +L n ) 1 K n. Since ˆβ β 0 = O p (a n ), where a n = (nh 3 ) 1/2 + h 2, similar to the proof of M 3 in (A.5), we have L n = o p (1). Hence, based on (A.4), we have ˆβ β 0 = J 1 K n (1+o p (1)). Next we prove the asymptotic normality for Kn = nh 3 K n. For any unit vector d R p+1, we prove {d T Cov(K n)d} 1 2 {d T K n d T E(K n)} D N(0, 1) Let ξ i = 1 φ ( Y i x T i β T 0 )d T x i. nh h Then d T K n = n ξ i. We check the Lyapunov s condition. Based on the results (A.4), we know Cov(K n ) = L nh 3 ν 2{1 + o(1)}. (A.8) Hence Var(d T K n) = nh 3 d T Cov(K n )d = g(0)ν 2 d T Ld + o(1). So we only need to prove ne ξ Noticing that (d T x i ) 2 x i 2 d 2 = x i 2, and φ ( ) is bounded, we have ne ξ 1 3 O{(nh 3 ) 1/2 } 0. So, the asymptotic normality for K n holds, i.e., we have nh 3 { K n h 2 K/2(1 + o p (h 2 )) } D N(0, ν 2 L). Based on the Slutsky s theorem, we have nh 3 ] [ˆβ β h2 2 J 1 K{1 + o p (h 2 D )} N { 0, ν 2 J 1 LJ 1}. 24
25 Proof of Theorem 2.4: Since ˆβ s has root n consistency, the asymptotic result of ˆβ 0 is the same as if ˆβ s were known and its asymptotic distribution can be derived from Theorem 2.3 by assuming x = 1 and the independence of ɛ and x, under which we have J 1 K = g (0)/(2g (0)) and J 1 LJ 1 = g (0) 2 g(0). Then the result follows. Proof of Theorem 2.5: Let φ (t) = 2πhφ h (t). Then M = n φ (y i x T i ˆβ), where ˆβ = T (Z). Notice that φ ( ) has a maximum at 0 with φ (0) = 1 and φ ( ) decreases monotonely toward both sides, and that lim φ (t) = 0 for t. We first prove that T (Z Z ) stays bounded if m < M. Let ξ > 0 be such that m + nξ < M, and let C be such that φ (t) ξ for t C. Let β be any real vector such that y x T β C for all z = (x, y) in Z. Then m+n φ (y i x T i T (Z)) M (A.9) and m+n φ (y i x T i β) nξ + m. (A.10) From (A.9) and (A.10), one knows that T (Z Z ) must satisfy y x T T (Z Z ) < C for a point in Z and thus T (Z Z ) is bounded. On the other hand, if m > M, let ξ > 0 such that m mξ > M, and let C be such that φ (t) ξ for t C. Assume that all points {(x n+1, y n+1 ),..., (x n+m, y n+m )} in Z are the same and satisfy a linear relationship y = x T β. Let β be any vector such that y n+1 x T n+1β < C. Then m+n φ (y i x T i β) M + mξ, (A.11) 25
26 and m+n φ (y i x T i β ) m. (A.12) From (A.11) and (A.12), one knows that T (Z Z ) must satisfy y n+1 x T n+1t (Z Z ) C. If we let y n+1 with x n+1 fixed, T (Z Z ) must go off to infinity, and we have breakdown. References Chaudhuri, P. and Marron, J. S. (1999). Sizer for Exploration of Structures in Curves. Journal of the American Statistical Association, 94, Cortez, P. and Morais, A. (2007). A Data Mining Approach to Predict Forest Fires using Meterological Data. Proceedings of the 13th EPIA Portuguese Conference on Artificial Intelligence, December, Guimaraes, Portugal, Davies, P. L. and Kovac, A. (2004). Densities, Spectral Densities and Modality. Annals of Statistics, 32, Donoho, D. L. (1982). Breakdown Properties of Multivariate Location Estimators. Ph.D. Qualifying paper, Department of Statistics, Harvard University. Donoho, D. L. and Huber, P. J. (1983). The Notion of Breakdown Point. In: Doksum, K., Hodges, J. L. (Editors), Festchrift in Honor of Erich Lehman. Wadsworth, Belmont, CA. Eddy, W. F. (1980). Optimum Kernel Estimators of the Mode. Annals of Statistics, 8, Fisher, N. I. and Marron, J. S. (2001). Mode Testing via The Excess Mass Estimate. Biometrika, 88,
27 Friedman J. H. and Fisher, N. I. (1999). Bump Hunting in High-Dimensional Data. Statistics and Computing, 9, Hall, P., Minnotte, M. C., and Zhang, C. (2004). Bump Hunting with Non-Gaussian Kernels. Annals of Statistics, 32, Hampel, F. R. (1971). A General Quantitative Definition of Robustness. Annals of Mathematiccal Statistics, 42, Hampel, F. R. (1974). The Influence Curve and Its Role in Robust Estimation. Journal of the American Statistical Association, 69, Huber, P. J. (1984). Finite Sample Breakdown Points of M- and P-Estimators. Annals of Statistics, 12, Kemp, G. C. R. and Santos Silva, J. M. C. (2010). Regression towards the mode. Department of Economics, University of Essex, Discussion Paper No jmcss/research.html. Kim, D. and Lindsay, B. G. (2011). Using Confidence Distribution Sampling to Visualize Confidence Sets. Statistica Sinica, 21(2), Lee, M. J. (1989). Mode Regression Journal of Econometrics, 42, Li, J., Ray, S., and Lindsay, B. G. (2007). A Nonparametric Statistical Approach to Clustering via Mode Identification. Journal of Machine Learning Research, 8(8), Muller, D. W. and Sawitzki, G. (1991). Excess Mass Estimates and Tests for Multimodality. Journal of the American Statistical Association, 86, Parzen, E. (1962). On Estimation of a Probability Density Function and Mode. The Annals of Mathematical Statistics, 33,
28 Ray, S. and Lindsay, B. G. (2005). The Topography of Multivariate Normal Mixtures. Annals of Statistics, 33, Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. Chapman and Hall: London. Scott, D. W. (1992). Multivariate Density Estimation: Theory, Practice and Visualization. New York: Wiley. Yao, W. and Lindsay, B. G. (2009). Bayesian Mixture Labeling by Highest Posterior Density. Journal of American Statistical Association, 104, Yohai, V. J. (1987). High Breakdown-point and High Efficiency Robust Estimates for Regression. The Annals of Statistics, 15,
Nonparametric Modal Regression
Nonparametric Modal Regression Summary In this article, we propose a new nonparametric modal regression model, which aims to estimate the mode of the conditional density of Y given predictors X. The nonparametric
More informationLocal Modal Regression
Local Modal Regression Weixin Yao Department of Statistics, Kansas State University, Manhattan, Kansas 66506, U.S.A. wxyao@ksu.edu Bruce G. Lindsay and Runze Li Department of Statistics, The Pennsylvania
More informationMinimum Hellinger Distance Estimation in a. Semiparametric Mixture Model
Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.
More informationKANSAS STATE UNIVERSITY
ROBUST MIXTURES OF REGRESSION MODELS by XIUQIN BAI M.S., Kansas State University, USA, 2010 AN ABSTRACT OF A DISSERTATION submitted in partial fulfillment of the requirements for the degree DOCTOR OF PHILOSOPHY
More informationKernel Density Based Linear Regression Estimate
Kernel Density Based Linear Regression Estimate Weixin Yao and Zibiao Zao Abstract For linear regression models wit non-normally distributed errors, te least squares estimate (LSE will lose some efficiency
More informationOn the Robust Modal Local Polynomial Regression
International Journal of Statistical Sciences ISSN 683 5603 Vol. 9(Special Issue), 2009, pp 27-23 c 2009 Dept. of Statistics, Univ. of Rajshahi, Bangladesh On the Robust Modal Local Polynomial Regression
More informationNonparametric and Varying Coefficient Modal. Regression
Nonparametric and Varying Coefficient Modal arxiv:1602.06609v1 [stat.me] 22 Feb 2016 Regression Weixin Yao and Sijia Xiang Summary In this article, we propose a new nonparametric data analysis tool, which
More informationNonparametric Methods
Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis
More informationLasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices
Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,
More informationRegression Analysis for Data Containing Outliers and High Leverage Points
Alabama Journal of Mathematics 39 (2015) ISSN 2373-0404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain
More informationLabel Switching and Its Simple Solutions for Frequentist Mixture Models
Label Switching and Its Simple Solutions for Frequentist Mixture Models Weixin Yao Department of Statistics, Kansas State University, Manhattan, Kansas 66506, U.S.A. wxyao@ksu.edu Abstract The label switching
More informationThe Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University
The Bootstrap: Theory and Applications Biing-Shen Kuo National Chengchi University Motivation: Poor Asymptotic Approximation Most of statistical inference relies on asymptotic theory. Motivation: Poor
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationLecture 12 Robust Estimation
Lecture 12 Robust Estimation Prof. Dr. Svetlozar Rachev Institute for Statistics and Mathematical Economics University of Karlsruhe Financial Econometrics, Summer Semester 2007 Copyright These lecture-notes
More informationKANSAS STATE UNIVERSITY Manhattan, Kansas
ROBUST MIXTURE MODELING by CHUN YU M.S., Kansas State University, 2008 AN ABSTRACT OF A DISSERTATION submitted in partial fulfillment of the requirements for the degree Doctor of Philosophy Department
More informationSome Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model
Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model 1. Introduction Varying-coefficient partially linear model (Zhang, Lee, and Song, 2002; Xia, Zhang, and Tong, 2004;
More informationEmpirical likelihood based modal regression
Stat Papers (2015) 56:411 430 DOI 10.1007/s00362-014-0588-4 REGULAR ARTICLE Empirical likelihood based modal regression Weihua Zhao Riquan Zhang Yukun Liu Jicai Liu Received: 24 December 2012 / Revised:
More informationSingle Index Quantile Regression for Heteroscedastic Data
Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationRobust estimation for linear regression with asymmetric errors with applications to log-gamma regression
Robust estimation for linear regression with asymmetric errors with applications to log-gamma regression Ana M. Bianco, Marta García Ben and Víctor J. Yohai Universidad de Buenps Aires October 24, 2003
More informationMULTIVARIATE TECHNIQUES, ROBUSTNESS
MULTIVARIATE TECHNIQUES, ROBUSTNESS Mia Hubert Associate Professor, Department of Mathematics and L-STAT Katholieke Universiteit Leuven, Belgium mia.hubert@wis.kuleuven.be Peter J. Rousseeuw 1 Senior Researcher,
More informationRobust Monte Carlo Methods for Sequential Planning and Decision Making
Robust Monte Carlo Methods for Sequential Planning and Decision Making Sue Zheng, Jason Pacheco, & John Fisher Sensing, Learning, & Inference Group Computer Science & Artificial Intelligence Laboratory
More informationStat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13)
Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13) 1. Weighted Least Squares (textbook 11.1) Recall regression model Y = β 0 + β 1 X 1 +... + β p 1 X p 1 + ε in matrix form: (Ch. 5,
More informationWEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract
Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of
More informationRobust high-dimensional linear regression: A statistical perspective
Robust high-dimensional linear regression: A statistical perspective Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics STOC workshop on robustness and nonconvexity Montreal,
More informationTest for Discontinuities in Nonparametric Regression
Communications of the Korean Statistical Society Vol. 15, No. 5, 2008, pp. 709 717 Test for Discontinuities in Nonparametric Regression Dongryeon Park 1) Abstract The difference of two one-sided kernel
More informationDiagnostics for Linear Models With Functional Responses
Diagnostics for Linear Models With Functional Responses Qing Shen Edmunds.com Inc. 2401 Colorado Ave., Suite 250 Santa Monica, CA 90404 (shenqing26@hotmail.com) Hongquan Xu Department of Statistics University
More informationStatistical Data Analysis
DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the
More informationOutline of GLMs. Definitions
Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density
More informationNonparametric Inference via Bootstrapping the Debiased Estimator
Nonparametric Inference via Bootstrapping the Debiased Estimator Yen-Chi Chen Department of Statistics, University of Washington ICSA-Canada Chapter Symposium 2017 1 / 21 Problem Setup Let X 1,, X n be
More informationSEMIPARAMETRIC QUANTILE REGRESSION WITH HIGH-DIMENSIONAL COVARIATES. Liping Zhu, Mian Huang, & Runze Li. The Pennsylvania State University
SEMIPARAMETRIC QUANTILE REGRESSION WITH HIGH-DIMENSIONAL COVARIATES Liping Zhu, Mian Huang, & Runze Li The Pennsylvania State University Technical Report Series #10-104 College of Health and Human Development
More informationInference For High Dimensional M-estimates: Fixed Design Results
Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49
More informationLinear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,
Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,
More informationComputation of an efficient and robust estimator in a semiparametric mixture model
Journal of Statistical Computation and Simulation ISSN: 0094-9655 (Print) 1563-5163 (Online) Journal homepage: http://www.tandfonline.com/loi/gscs20 Computation of an efficient and robust estimator in
More informationOn robust and efficient estimation of the center of. Symmetry.
On robust and efficient estimation of the center of symmetry Howard D. Bondell Department of Statistics, North Carolina State University Raleigh, NC 27695-8203, U.S.A (email: bondell@stat.ncsu.edu) Abstract
More informationImproving linear quantile regression for
Improving linear quantile regression for replicated data arxiv:1901.0369v1 [stat.ap] 16 Jan 2019 Kaushik Jana 1 and Debasis Sengupta 2 1 Imperial College London, UK 2 Indian Statistical Institute, Kolkata,
More informationAdvanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting)
Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting) Professor: Aude Billard Assistants: Nadia Figueroa, Ilaria Lauzana and Brice Platerrier E-mails: aude.billard@epfl.ch,
More informationSMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES
Statistica Sinica 19 (2009), 71-81 SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Song Xi Chen 1,2 and Chiu Min Wong 3 1 Iowa State University, 2 Peking University and
More informationCOMPARISON OF THE ESTIMATORS OF THE LOCATION AND SCALE PARAMETERS UNDER THE MIXTURE AND OUTLIER MODELS VIA SIMULATION
(REFEREED RESEARCH) COMPARISON OF THE ESTIMATORS OF THE LOCATION AND SCALE PARAMETERS UNDER THE MIXTURE AND OUTLIER MODELS VIA SIMULATION Hakan S. Sazak 1, *, Hülya Yılmaz 2 1 Ege University, Department
More informationWEIGHTED LIKELIHOOD NEGATIVE BINOMIAL REGRESSION
WEIGHTED LIKELIHOOD NEGATIVE BINOMIAL REGRESSION Michael Amiguet 1, Alfio Marazzi 1, Victor Yohai 2 1 - University of Lausanne, Institute for Social and Preventive Medicine, Lausanne, Switzerland 2 - University
More informationFunction of Longitudinal Data
New Local Estimation Procedure for Nonparametric Regression Function of Longitudinal Data Weixin Yao and Runze Li Abstract This paper develops a new estimation of nonparametric regression functions for
More information401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.
401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis
More informationNew Local Estimation Procedure for Nonparametric Regression Function of Longitudinal Data
ew Local Estimation Procedure for onparametric Regression Function of Longitudinal Data Weixin Yao and Runze Li The Pennsylvania State University Technical Report Series #0-03 College of Health and Human
More informationLinear regression COMS 4771
Linear regression COMS 4771 1. Old Faithful and prediction functions Prediction problem: Old Faithful geyser (Yellowstone) Task: Predict time of next eruption. 1 / 40 Statistical model for time between
More informationROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY
ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY G.L. Shevlyakov, P.O. Smirnov St. Petersburg State Polytechnic University St.Petersburg, RUSSIA E-mail: Georgy.Shevlyakov@gmail.com
More informationRobust Variable Selection Through MAVE
Robust Variable Selection Through MAVE Weixin Yao and Qin Wang Abstract Dimension reduction and variable selection play important roles in high dimensional data analysis. Wang and Yin (2008) proposed sparse
More informationThe Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility
The Slow Convergence of OLS Estimators of α, β and Portfolio Weights under Long Memory Stochastic Volatility New York University Stern School of Business June 21, 2018 Introduction Bivariate long memory
More informationBayesian spatial quantile regression
Brian J. Reich and Montserrat Fuentes North Carolina State University and David B. Dunson Duke University E-mail:reich@stat.ncsu.edu Tropospheric ozone Tropospheric ozone has been linked with several adverse
More informationDiscussion Paper No. 28
Discussion Paper No. 28 Asymptotic Property of Wrapped Cauchy Kernel Density Estimation on the Circle Yasuhito Tsuruta Masahiko Sagae Asymptotic Property of Wrapped Cauchy Kernel Density Estimation on
More informationGARCH Models Estimation and Inference
GARCH Models Estimation and Inference Eduardo Rossi University of Pavia December 013 Rossi GARCH Financial Econometrics - 013 1 / 1 Likelihood function The procedure most often used in estimating θ 0 in
More informationInequalities Relating Addition and Replacement Type Finite Sample Breakdown Points
Inequalities Relating Addition and Replacement Type Finite Sample Breadown Points Robert Serfling Department of Mathematical Sciences University of Texas at Dallas Richardson, Texas 75083-0688, USA Email:
More informationHigh Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data
High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data Song Xi CHEN Guanghua School of Management and Center for Statistical Science, Peking University Department
More informationSingle Index Quantile Regression for Heteroscedastic Data
Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University JSM, 2015 E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015
More informationLocal Polynomial Wavelet Regression with Missing at Random
Applied Mathematical Sciences, Vol. 6, 2012, no. 57, 2805-2819 Local Polynomial Wavelet Regression with Missing at Random Alsaidi M. Altaher School of Mathematical Sciences Universiti Sains Malaysia 11800
More informationOn Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness
Statistics and Applications {ISSN 2452-7395 (online)} Volume 16 No. 1, 2018 (New Series), pp 289-303 On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness Snigdhansu
More informationDoes k-th Moment Exist?
Does k-th Moment Exist? Hitomi, K. 1 and Y. Nishiyama 2 1 Kyoto Institute of Technology, Japan 2 Institute of Economic Research, Kyoto University, Japan Email: hitomi@kit.ac.jp Keywords: Existence of moments,
More informationChapter 9. Non-Parametric Density Function Estimation
9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least
More informationInference on distributions and quantiles using a finite-sample Dirichlet process
Dirichlet IDEAL Theory/methods Simulations Inference on distributions and quantiles using a finite-sample Dirichlet process David M. Kaplan University of Missouri Matt Goldman UC San Diego Midwest Econometrics
More informationSupplement to Quantile-Based Nonparametric Inference for First-Price Auctions
Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Vadim Marmer University of British Columbia Artyom Shneyerov CIRANO, CIREQ, and Concordia University August 30, 2010 Abstract
More informationThe EM Algorithm for the Finite Mixture of Exponential Distribution Models
Int. J. Contemp. Math. Sciences, Vol. 9, 2014, no. 2, 57-64 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ijcms.2014.312133 The EM Algorithm for the Finite Mixture of Exponential Distribution
More informationSmooth simultaneous confidence bands for cumulative distribution functions
Journal of Nonparametric Statistics, 2013 Vol. 25, No. 2, 395 407, http://dx.doi.org/10.1080/10485252.2012.759219 Smooth simultaneous confidence bands for cumulative distribution functions Jiangyan Wang
More informationPh.D. Qualifying Exam Friday Saturday, January 3 4, 2014
Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Put your solution to each problem on a separate sheet of paper. Problem 1. (5166) Assume that two random samples {x i } and {y i } are independently
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationLocal linear multiple regression with variable. bandwidth in the presence of heteroscedasticity
Local linear multiple regression with variable bandwidth in the presence of heteroscedasticity Azhong Ye 1 Rob J Hyndman 2 Zinai Li 3 23 January 2006 Abstract: We present local linear estimator with variable
More informationNonparametric Heteroscedastic Transformation Regression Models for Skewed Data, with an Application to Health Care Costs
Nonparametric Heteroscedastic Transformation Regression Models for Skewed Data, with an Application to Health Care Costs Xiao-Hua Zhou, Huazhen Lin, Eric Johnson Journal of Royal Statistical Society Series
More informationQuantile Regression for Extraordinarily Large Data
Quantile Regression for Extraordinarily Large Data Shih-Kang Chao Department of Statistics Purdue University November, 2016 A joint work with Stanislav Volgushev and Guang Cheng Quantile regression Two-step
More informationSpatial inference. Spatial inference. Accounting for spatial correlation. Multivariate normal distributions
Spatial inference I will start with a simple model, using species diversity data Strong spatial dependence, Î = 0.79 what is the mean diversity? How precise is our estimate? Sampling discussion: The 64
More informationGaussian Mixture Models
Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some
More informationLong-Run Covariability
Long-Run Covariability Ulrich K. Müller and Mark W. Watson Princeton University October 2016 Motivation Study the long-run covariability/relationship between economic variables great ratios, long-run Phillips
More informationNonparametric Econometrics
Applied Microeconometrics with Stata Nonparametric Econometrics Spring Term 2011 1 / 37 Contents Introduction The histogram estimator The kernel density estimator Nonparametric regression estimators Semi-
More informationUniversity, Tempe, Arizona, USA b Department of Mathematics and Statistics, University of New. Mexico, Albuquerque, New Mexico, USA
This article was downloaded by: [University of New Mexico] On: 27 September 2012, At: 22:13 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered
More informationLeast Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions
Journal of Modern Applied Statistical Methods Volume 8 Issue 1 Article 13 5-1-2009 Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error
More informationQuantile regression and heteroskedasticity
Quantile regression and heteroskedasticity José A. F. Machado J.M.C. Santos Silva June 18, 2013 Abstract This note introduces a wrapper for qreg which reports standard errors and t statistics that are
More informationLecture 5: Clustering, Linear Regression
Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-3.2 STATS 202: Data mining and analysis October 4, 2017 1 / 22 .0.0 5 5 1.0 7 5 X2 X2 7 1.5 1.0 0.5 3 1 2 Hierarchical clustering
More informationInference For High Dimensional M-estimates. Fixed Design Results
: Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and
More informationPreliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference
1 / 171 Bootstrap inference Francisco Cribari-Neto Departamento de Estatística Universidade Federal de Pernambuco Recife / PE, Brazil email: cribari@gmail.com October 2013 2 / 171 Unpaid advertisement
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationA Comparison of Robust Estimators Based on Two Types of Trimming
Submitted to the Bernoulli A Comparison of Robust Estimators Based on Two Types of Trimming SUBHRA SANKAR DHAR 1, and PROBAL CHAUDHURI 1, 1 Theoretical Statistics and Mathematics Unit, Indian Statistical
More informationUncertainty Quantification for Inverse Problems. November 7, 2011
Uncertainty Quantification for Inverse Problems November 7, 2011 Outline UQ and inverse problems Review: least-squares Review: Gaussian Bayesian linear model Parametric reductions for IP Bias, variance
More informationFinite Population Sampling and Inference
Finite Population Sampling and Inference A Prediction Approach RICHARD VALLIANT ALAN H. DORFMAN RICHARD M. ROYALL A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane
More informationConfidence Intervals for Low-dimensional Parameters with High-dimensional Data
Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology
More informationLabel switching and its solutions for frequentist mixture models
Journal of Statistical Computation and Simulation ISSN: 0094-9655 (Print) 1563-5163 (Online) Journal homepage: http://www.tandfonline.com/loi/gscs20 Label switching and its solutions for frequentist mixture
More informationNeural Network Approach to Estimating Conditional Quantile Polynomial Distributed Lag (QPDL) Model with an Application to Rubber Price Returns
American Journal of Business, Economics and Management 2015; 3(3): 162-170 Published online June 10, 2015 (http://www.openscienceonline.com/journal/ajbem) Neural Network Approach to Estimating Conditional
More informationREGRESSION WITH SPATIALLY MISALIGNED DATA. Lisa Madsen Oregon State University David Ruppert Cornell University
REGRESSION ITH SPATIALL MISALIGNED DATA Lisa Madsen Oregon State University David Ruppert Cornell University SPATIALL MISALIGNED DATA 10 X X X X X X X X 5 X X X X X 0 X 0 5 10 OUTLINE 1. Introduction 2.
More informationEconomics 583: Econometric Theory I A Primer on Asymptotics
Economics 583: Econometric Theory I A Primer on Asymptotics Eric Zivot January 14, 2013 The two main concepts in asymptotic theory that we will use are Consistency Asymptotic Normality Intuition consistency:
More informationSimple Linear Regression
Simple Linear Regression Reading: Hoff Chapter 9 November 4, 2009 Problem Data: Observe pairs (Y i,x i ),i = 1,... n Response or dependent variable Y Predictor or independent variable X GOALS: Exploring
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 Discriminative vs Generative Models Discriminative: Just learn a decision boundary between your
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationDiscussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis
Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis Sílvia Gonçalves and Benoit Perron Département de sciences économiques,
More informationIndian Statistical Institute
Indian Statistical Institute Introductory Computer programming Robust Regression methods with high breakdown point Author: Roll No: MD1701 February 24, 2018 Contents 1 Introduction 2 2 Criteria for evaluating
More informationSmall Sample Corrections for LTS and MCD
myjournal manuscript No. (will be inserted by the editor) Small Sample Corrections for LTS and MCD G. Pison, S. Van Aelst, and G. Willems Department of Mathematics and Computer Science, Universitaire Instelling
More informationBTRY 4090: Spring 2009 Theory of Statistics
BTRY 4090: Spring 2009 Theory of Statistics Guozhang Wang September 25, 2010 1 Review of Probability We begin with a real example of using probability to solve computationally intensive (or infeasible)
More informationLabor-Supply Shifts and Economic Fluctuations. Technical Appendix
Labor-Supply Shifts and Economic Fluctuations Technical Appendix Yongsung Chang Department of Economics University of Pennsylvania Frank Schorfheide Department of Economics University of Pennsylvania January
More informationA nonparametric method of multi-step ahead forecasting in diffusion processes
A nonparametric method of multi-step ahead forecasting in diffusion processes Mariko Yamamura a, Isao Shoji b a School of Pharmacy, Kitasato University, Minato-ku, Tokyo, 108-8641, Japan. b Graduate School
More informationAFT Models and Empirical Likelihood
AFT Models and Empirical Likelihood Mai Zhou Department of Statistics, University of Kentucky Collaborators: Gang Li (UCLA); A. Bathke; M. Kim (Kentucky) Accelerated Failure Time (AFT) models: Y = log(t
More informationIssues on quantile autoregression
Issues on quantile autoregression Jianqing Fan and Yingying Fan We congratulate Koenker and Xiao on their interesting and important contribution to the quantile autoregression (QAR). The paper provides
More informationRegression Graphics. 1 Introduction. 2 The Central Subspace. R. D. Cook Department of Applied Statistics University of Minnesota St.
Regression Graphics R. D. Cook Department of Applied Statistics University of Minnesota St. Paul, MN 55108 Abstract This article, which is based on an Interface tutorial, presents an overview of regression
More informationLecture 5: Clustering, Linear Regression
Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-3.2 STATS 202: Data mining and analysis October 4, 2017 1 / 22 Hierarchical clustering Most algorithms for hierarchical clustering
More informationESTIMATION OF NONLINEAR BERKSON-TYPE MEASUREMENT ERROR MODELS
Statistica Sinica 13(2003), 1201-1210 ESTIMATION OF NONLINEAR BERKSON-TYPE MEASUREMENT ERROR MODELS Liqun Wang University of Manitoba Abstract: This paper studies a minimum distance moment estimator for
More informationEmpirical Likelihood Methods for Sample Survey Data: An Overview
AUSTRIAN JOURNAL OF STATISTICS Volume 35 (2006), Number 2&3, 191 196 Empirical Likelihood Methods for Sample Survey Data: An Overview J. N. K. Rao Carleton University, Ottawa, Canada Abstract: The use
More information