A New Regression Model: Modal Linear Regression

Size: px
Start display at page:

Download "A New Regression Model: Modal Linear Regression"

Transcription

1 A New Regression Model: Modal Linear Regression Weixin Yao and Longhai Li Abstract The population mode has been considered as an important feature of data and it is traditionally estimated based on some nonparametric kernel density estimator. However, little research has been done to apply the mode concept if certain regression structure is imposed. This article develops a new data analysis tool, called modal linear regression, to explore the high-dimension data. The modal linear regression assumes the mode of the conditional density of Y given predictors x is a linear function of x. It is distinguished from the conventional linear regression in that it measures the center using the most likely conditional values rather than the conditional average. We will argue that the modal regression, besides its robustness, gives a more meaningful point prediction and larger coverage probability for prediction than the mean regression when the error density is skewed if the same length of short intervals, centered around each estimate, are used. We propose an EM type algorithm for the proposed modal liner regression and provide the systematic asymptotic properties under the general error density assumption, such as skewed error density. Application to simulated data and real data demonstrates that the proposed modal regression has better predictive performance than the mean regression. 1 Weixin Yao is Assistant Professor, Department of Statistics, Kansas State University, Manhattan, Kansas 66506, U.S.A. wxyao@ksu.edu. Longhai Li is Assistant Professor, Department of Mathematics and Statistics, University of Saskatchewan, Saskatoon, Saskatchewan, S7N5E6, Canada. longhai@math.usask.ca. The research of Longhai Li is supported by fundings from Natural Sciences and Engineering Research Council of Canada, and Canadian Foundation of Innovations. 1

2 Key Words: Forest fires data; Linear regression; Modal regression; Mode. 1. Introduction The mode of a distribution is regarded as an important feature of data. Several authors have made efforts to identify the modes of population distributions for low-dimensional data. See, for example, Muller and Sawitzki (1991); Scott (1992); Friedman and Fisher (1999); Chaudhuri and Marron (1999); Fisher and Marron (2001); Davies and Kovac (2004); Hall, Minnotte, and Zhang (2004); Ray and Lindsay (2005); Yao and Lindsay (2009). (Please see the command npconmode in R package np for the nonparametric mode estimation.) In high-dimensional data, it is common to impose some model structure assumption such as assumption on conditional distributions. Thus, it is of great interest to study the mode hunting for conditional distributions. Given a random sample {(x i, y i ), i = 1,..., n}, where x i is a p-dimension column vector, suppose f(y x) is the conditional density function of Y given x. For the conventional regression models, the mean of f(y x) is usually used to investigate the relationship between Y and x and the linear regression assumes that the mean of f(y x) is a linear function of x. In this article, we propose a new regression model called modal linear regression that assumes the mode of f(y x) is a linear function of the predictor x. Modal linear regression measures the center using the most likely conditional values rather than the conditional average used by the traditional linear regression. Compared to the traditional mean regression, our proposed modal regression has the following main advantages. 1. When the error density is highly skewed, the modal regression provides a more meaningful location estimator/point prediction than the mean regression. 2. Since the modal regression focuses on the highest conditional density region, a short interval around the modal regression estimate is expected to have larger coverage prob- 2

3 ability when doing prediction than the same length of interval around the mean regression estimate. 3. The modal regression can be used to estimate the relationship between variables for majority of the data when a small proportion of data are contaminated or don t follow the same relationship as the majority data. 4. Due to the robustness of the mode, the modal linear regression is also robust to the outliers and the heavy-tailed error density. 5. The modal regression has several advantages over the traditional robust regression. The traditional robust regression still targets the mean regression function and requires the assumption of symmetric error density; it doesn t have a good model interpretation if the error density is not symmetric. However, the modal regression does not require the error density to be symmetric and has the nice modal explanation for general error density. Therefore, the modal regression provides us another powerful tool to explore the high-dimension data and it can be easily extended to some other regression models, such as nonlinear regression, nonparametric regression, and varying coefficient partially linear regression. Quantile regression is another alternative data analysis tool when the data is skewed. However, quantile regression cannot reveal the modal information and might produce the low density point prediction. Therefore, the modal regression is a potentially very useful addition to the current data analysis tool. We propose a kernel based objective function to estimate the modal regression line. We further develop a modal EM algorithm to maximize the proposed objective function. The convergence rate and sampling properties of the proposed estimation procedure are systematically studied. A Monte Carlo simulation study is also conducted to assess the finite sample performance of the proposed procedure. The 3

4 application of the proposed modal regression to predict forest fires using meteorological data shows that the proposed method has better predictive performance. We propose a method for the proposed modal regression to use the information of error density to construct the skewed prediction interval. The leave-one-out cross validation shows that modal regression provides much shorter prediction intervals than the mean regression for forest fires data. The rest of this article is organized as follows. In Section 2, we introduce our new modal linear regression, develop the modal EM algorithm, and study the asymptotic properties of the resulting estimate. In Section 3, a Monte Carlo simulation study is conducted, and a real data application is used to demonstrate the proposed methodology. Some discussions are given in Section 4. Proofs of consistency of our estimators and the necessary technical conditions are given in the Appendix. 2. Modal linear regression 2.1. Introduction to modal linear regression Suppose a response variable y given a set of predictor x is distributed with a PDF f(y x). In this article, we propose to use the mode of f(y x), denoted by Mode(Y x) = arg max y (f(y x)), to investigate the relationship between Y and x. The proposed modal linear regression method assumes that Mode(Y x) = x T β. (2.1) We will discuss the extension of modal linear regression in (2.1) to some other models in Section 4. To include the intercept term in (2.1), we assume that the first element of x is 1. Let ɛ = y x T β and denote by g(ɛ x) the conditional density of ɛ given x. Here, we allow the conditional density of ɛ given x to depend on x. Based on the model assumption (2.1), one knows that g(ɛ x) is maximized at 0 for any x. If g(ɛ x) is symmetric about 0, 4

5 the β in (2.1) will be the same as the conventional linear regression parameters. However, if g(ɛ x) is skewed, they will be different and it is even possible that the modal regression is a linear function of x but the conventional mean regression function is nonlinear. Let us use an example to illustrate the difference between modal regression function and conventional mean regression function when the error is skewed. Example 1: Let (x, Y ) satisfy the following model assumption Y = m(x) + σ(x)ε, (2.2) where ε has density h( ). Suppose h( ) is a skewed density with mean 0 and mode 1. a) If m(x) = x T β and σ(x) = x T α, then E(Y x) = x T β and Mode(Y x) = x T (β + α), where Mode(Y x) means the conditional mode of Y given x. Therefore, Y depends on x linearly from the point of view of both mean regression and modal regression, although their regression parameters are different. b) If m(x) = 0 and σ(x) = x T α, then E(Y x) = 0 and Mode(Y x) = x T α. Therefore, from the point of view of conventional mean regression, Y does not depend on x, but based on modal regression, Y does depend on x through the modal line. From this example, we can see that the variable selection techniques based on modal regression might reveal some new useful predictors compared to the mean regression. 5

6 We propose to estimate the modal regression parameter β in (2.1) by maximizing Q h (β) 1 n φ h (y i x T i β), (2.3) where φ h (t) = h 1 φ(t/h) and φ(t) is a kernel density function. Denote by ˆβ the maximizer of (2.3). We call ˆβ the modal linear regression (MODLR) estimator. For ease of computation, we use the standard normal density function for φ(t) throughout this paper. See (2.6) below. To see why (2.3) can be used to estimate the modal regression, taking β = β 0, the intercept term only, then (2.3) is simplified to Q h (β 0 ) 1 n φ h (y i β 0 ). (2.4) As a function of β 0, Q h (.) is the kernel estimate of the density function of y. Therefore, the maximizer of (2.4) is the mode of the kernel density function based on y 1,... y n. When n, h 0, the mode of kernel density function will converge to the mode of the distribution of y. Such modal estimator has been proposed by Parzen (1962). Here, we use this kernel type objective function to estimate the modal regression parameter β. We will prove, in Section 2.3, that the ˆβ found by maximizing (2.3) is a consistent estimate of modal regression parameter in (2.1) under very general error density, such as skewed error density, if h 0 when n. Note that for any fixed β, Q h (β) in (2.3) is the value of kernel density function of the residuals ɛ i = y i x i β at point 0. Therefore, by maximizing (2.3) with respect to β, we will find a line xˆβ such that the value of the kernel density function of residuals at 0 is maximized. If φ h (t) = (2h) 1 I( t h), a uniform kernel, is used, for φ( ), then (2.3) tries to find the line x T ˆβ such that the band x T ˆβ ± h consists of the largest number of response values y i. Therefore, the modal regression gives more meaningful prediction than the mean 6

7 regression when the data is skewed. Lee (1989) used such uniform kernel to estimate the modal regression. However, in Lee (1989), h is fixed and does not depend on the sample size n. Therefore, they require the error to be symmetric to get the consistent modal line (note that in such case the modal line is the same as the traditional mean regression line). Thus, their modal regression estimator is essentially a robust regression estimate under the assumption of symmetric error density Modal expectation-maximization algorithm Note that (2.3) does not have an explicit solution. In this section, we propose to extend the modal expectation-maximization (MEM) algorithm, proposed by Li, Ray, and Lindsay (2007), to maximize (2.3). Similar to an EM algorithm, the MEM algorithm consists of two steps: E-step and M-step. Let β (0) be the initial value. Starting with k = 0, repeat the following two steps until it converges: E-Step: In this step, we calculate weights π(j β (k) ), j = 1,..., n as π(j β (k) ) = φ h (y j x T j β (k) ) n φ h(y i x T i β(k) ) φ h(y j x T j β (k) ). (2.5) M-Step: In this step, we update β (k+1) β (k+1) = arg max β j=1 { } π(j β (k) ) log φ h (y j x T j β) = (X T W k X) 1 X T W k y, (2.6) where X = (x 1,..., x n ) T, W k is an n n diagonal matrix with diagonal elements π(j β (k) )s, and y = (y 1,..., y n ) T. Note that the function optimized in M-step is just a weighted sum of log likelihoods of ordinary linear regression, therefore the explicit maximizer is possible. The following 7

8 theorem establishes the ascending property of the proposed MEM algorithm and its proof is given in the Appendix. Theorem 2.1. Each iteration of (2.5) and (2.6) will monotonically non-decrease the objective function (2.3), i.e., Q h (β (k+1) ) Q h (β (k) ), for all k. Note that the converged value may depend on the starting point as usual EM algorithms, and there is no guarantee that the algorithm will converge to the global optimal solution of (2.3). Therefore, it is prudent to run the algorithm from several starting-points and choose the best local optima found. From the MEM algorithm, it can be seen that the major difference between the mean regression estimate by the least squares (LSE) and the modal regression estimate (MODLR) lies in the E step. For the LSE, each observation has equal weight. On the other hand, for MODLR, the weight depends on how close y i is to the modal regression line. This weight scheme allows MODLR to reduce the effect of observations far away from the modal regression line to achieve robustness. The iteratively reweighted least squares (IRWLS) algorithm has been commonly used for general M-estimators. Since our modal regression can be considered as a special case of M- estimators, IRWLS algorithm can be also applied to our proposed modal regression. When normal kernel is used for φ( ), the IRWLS algorithm is actually equivalent to the proposed MEM algorithm, but they are different if φ( ) is not normal. Note that IRWLS has been proved to have monotone and convergence property if φ(x)/x is nonincreasing. But the proposed MEM algorithm has monotone property for any kernel density φ( ). In addition, note that φ(x)/x is not nonincreasing if φ(x) is a normal density function. Therefore, the proposed MEM algorithm provides better explanation why the IRWLS is monotone for normal kernel density Theoretical properties 8

9 Note that the asymptotic properties established for traditional M-estimator require the error density to be symmetric and the objective function to be fixed, since traditional M-estimators still focus on mean regression. For our proposed modal regression (2.3), the tuning parameter h in the objective function goes to zero and the error density can be skewed. Therefore, the theoretical results on the traditional M estimators can not be directly applied to modal regression. In this section, we prove the consistency of our proposed modal regression estimator ˆβ for model (2.1) and provide its convergence rate as well as its asymptotic distribution. Their proofs are given in the Appendix. Theorem 2.2. When h 0 and nh 5, under the regularity conditions (A1) (A3) in the Appendix, there exists a consistent maximizer of (2.3) such that ˆβ β 0 = O p {h 2 + (nh 3 ) 1/2 }, where β 0 is the true coefficient of the modal regression line defined in (2.1). Theorem 2.3. Under the same assumptions as Theorem 2.2, the ˆβ, given in Theorem 2.2, has the following asymptotic normality result nh 3 ] [ˆβ β 0 h2 2 J 1 D K{1 + o p (1)} N { 0, ν 2 J 1 LJ 1}, (2.7) where ν 2 = t 2 φ 2 (t)dt and J = E{g (0 x i )x i x T i }; K = E{g (0 x i )x i }; L = E{g(0 x i )x i x T i }. Paren (1962) and Eddy (1980) have proved similar asymptotic results for kernel estimators of the mode of the distribution of y without considering the regression effect. Therefore, 9

10 the results of Paren (1962) and Eddy (1980) can be considered as a special case of Theorem 2.3 when there is no predictor involved, i.e., x = 1. From Theorem 2.3, we can see that the asymptotic bias of ˆβ is h 2 J 1 K/2 and the asymptotic variance is ν 2 J 1 LJ 1 /(nh 3 ). A theoretic optimal bandwidth h for estimating β can be obtained by minimizing the asymptotic weighted mean squared errors (MSE) { } E (ˆβ β 0 ) T W (ˆβ β 0 ) K T J 1 W J 1 Kh 4 /4 + (nh 3 ) 1 tr ( J 1 LJ 1 W ), where W is a weight function, such as an identity matrix, reflecting which coefficient is more important in inference, and tr(a) is the trace of A. Therefore, the asymptotic optimal bandwidth h is If W = (J 1 LJ 1 ) 1 variance of ˆβ, then ĥ opt = [ ] 3ν2 tr (J 1 LJ 1 1/7 W ) n 1/7. (2.8) K T J 1 W J 1 K = JL 1 J, which is proportional to the inverse of the asymptotic ĥ opt = [ ] 1/7 3ν2 (p + 1) n 1/7. (2.9) K T L 1 K Let β = (β 0, β s ) T, where β 0 is a scalar intercept parameter and β s is the slope parameter. If ɛ is independent of x, then J 1 K = (1, 0,..., 0) T g (0)/{2g (0)}, and thus the asymptotic bias of the slope parameter β s is 0. Therefore, the optimal bandwidth h for estimating β s should go to infinity, which implies that the resulting estimate ˆβ s is a least square estimate with root n consistency. This is expected since when ɛ is independent of x, the slope parameter β s of modal regression line is the same as the slope parameter of conventional mean regression line and thus can be estimated at root n convergence rate. 10

11 Given the root n consistent estimate ˆβ s (using LSE, for example), we propose to further estimate β 0 by 1 ˆβ 0 = arg max β 0 n φ h (y i x T i ˆβ s β 0 ). (2.10) The above maximization can be done similarly using the MEM algorithm proposed in Section 2.2. Denote by ˆβ 0 the maximizer of (2.10), we have the following result. Its proof is given in the Appendix. Theorem 2.4. Under the same assumption as Theorem 2.2, if ɛ is independent of x and g (0) 0, the ˆβ 0 defined in (2.10) has the following asymptotic distribution: { } { nh 3 ˆβ 0 β 0 g (0)h 2 2g (0) + o p(h 2 D ) N 0, g(0)ν } 2 [g (0)] 2. (2.11) Note that when ɛ is independent of x, J 1 LJ 1 = g (0) 2 g(0)e(xx T ) 1. Let A = E(xx T ) = 1 A 12 A T 12 A 22 and a 11 be the (1,1) element of A 1. Noting that a 11 = (1 A 12 A 1 22 A T 12) 1 and A 22 is positive definite, we have a Therefore, based on Theorem 2.3 and 2.4, we can see that using the root n consistent estimate ˆβ s as initial, we can get more efficient estimate of the intercept β 0 than the one found by maximizing (2.3) directly. This is reasonable since the estimate ˆβ 0 in (2.10) need not account for the uncertainty of ˆβ s due to its root n consistency and thus ˆβ 0 is asymptotically as efficient as if β s were known. From Theorem 2.4, we can see that the asymptotic bias of ˆβ 0 is {2g (0)} 1 g (0)h 2 and its asymptotic variance is [{g (0)} 2 nh 3 ] 1 g(0)ν 2. By minimizing the asymptotic MSE, we 11

12 can get the asymptotic optimal bandwidth h for estimating β 0 : [ ] 1/7 3g(0)ν2 ĥ opt = n 1/7. (2.12) {g (0)} Finite sample breakdown point To better investigate how robust the MODLR estimator is, we also calculate its finite sample breakdown point. Breakdown point is used to quantify the proportion of bad data in a sample that an estimator can tolerate before returning arbitrary values. Since usually the breakdown point is most useful in a small sample setup (Donoho, 1982; Donoho and Huber, 1983), we will mainly focus on the finite sample breakdown point. A number of definitions for the finite sample breakdown point have been proposed (see, for example, Hampel, 1971, 1974; Donoho, 1982; Donoho and Huber, 1983). In this paper, we shall work with the finite sample contamination breakdown point. Let z i = (x i, y i ). Given the sample Z = (z 1,..., z n ), denote T (Z) the MODLR estimate of β, as defined in (2.3). We can corrupt the original sample Z by adding m arbitrary points Z = (z n+1,..., z n+m ). The corrupted sample Z Z then has sample size n + m and contains a fraction δ = m/(m + n) of bad values. The finite sample contamination breakdown point δ is defined as { } m δ (Z, T ) = min 1 m n n + m : sup T (Z Z ) =, (2.13) Z where is Euclidean norm. Theorem 2.5. Given the observations Z = (z 1,..., z n ), suppose T (Z) = ˆβ is the MODLR estimate defined in (2.3). Let M = 2πh φ h (y i x T ˆβ). i (2.14) 12

13 Then the finite sample contamination breakdown point of MODLR is δ (Z, T ) = m n + m, (2.15) where m is an integer satisfying M m M +1, a is the largest integer not greater than a, and a is the smallest integer not less than a. The proof of Theorem 2.5 is given in the Appendix. From the above theorem, we can see that the breakdown point depends not only on φ( ), and the tuning parameter h, but also on the sample configuration. (However, Huber (1984) pointed out if the scale (contained in the bandwidth h of the MODLR estimate) is determined from the sample itself, empirically, the breakdown point is quite high.) 3. Simulation Study and Application In this section, we will conduct a Monte Carlo simulation to assess the performance of our proposed MODLR estimator for finite sample size and compare it with some other regression methods. A real data application is also provided. To use the modal regression estimator, we need to first select the bandwidth. Note that the asymptotic optimal bandwidth formula (2.9) contains the unknown quantities g (v) (0 x), v = 0, 2, 3, the vth derivative of conditional density of ɛ given x. Hence, they are not ready to use. One of the commonly used methods is to replace the unknown quantities by some estimates. Given the initial residual ˆɛ i = y i x T ˆβ, i where ˆβ is the traditional least squares estimate (or a robust estimate if there are some outliers) of β, we can estimate their mode, denoted by ˆm, by maximizing the kernel density estimator (Paren, 1962). Under the assumption of independence of ɛ and x, ˆɛ i ˆm approximately has density g( ) and thus 13

14 g (v) (0 x) can be estimated by (See, for example, Silverman, 1986 and Scott, 1992) ĝ (v) (0 x) = 1 nh v+1 { } K (v) ˆɛi ˆm, v = 0, 2, 3, h where K (v) ( ) is the vth derivative of kernel density function K( ). Then, we can estimate J, K, and L by Ĵ = n 1 ĝ (0 x i )x i x T i, ˆK = n 1 ĝ (0 x i )x i, and ˆL = n 1 ĝ(0 x i )x i x T i, and apply the formula (2.9) to estimate ĥopt. To refine the bandwidth selection, one might further iteratively update the chosen bandwidth by recalculating the residual ˆɛ i based on the modal linear regression estimate A Monte Carlo simulation Example 1: Generate iid sample {(x i, y i ), i = 1,..., n} from the following model Y = 1 + 3X + σ(x)ε, where X U(0, 1), ε 0.5N( 1, ) + 0.5N(1, ), and σ(x) = 1 + 2X. Note that E(ε) = 0, Mode(ε) = 1, and Median(ε) = 0.67 (the last two quantities are approximate). The traditional mean regression function is E(Y X) = 1 + 3X. The proposed modal regression function is Mode(Y X) = 2 + 5X, with modal regression residual Y Mode(Y X) = (1 + 2X)(ε 1), whose distribution peaks at 0, but is negatively skewed. The median regression is Median(Y X) = X. We consider and compare the following four methods: 1) traditional mean regression based on least squares estimate (LSE); 2) median regression (MEDREG); 3) MM-estimate (M-estimate with a initial robust M-estimate of scale, Yohai, 1987) based on Tukey s biweight ψ-function; 4) the proposed modal linear regression (MODLR). 14

15 Figure 1 is the scatter plot of a typical generated sample with n = 200 and four regression curves. From the plot, we can see that the modal line goes through the area containing the most number of points. A small prediction band around this line is expected to contain the most number of future points. In contrast, the mean regression line based on LSE is skewed to a flatter line and lies in a much lower dense area for capturing the conditional mean. The regression lines based on median regression and MM-estimate lie in higher density areas than the regression line based on LSE. Table 1 reports the average and standard error (Std) of the parameter estimates for each method based on 1,000 replicates. From the table, we can see that LSE, MEDREG, and MODLR can estimate their target parameters well. However, MM-estimate can not estimate the mean regression well since the assumption of symmetric error density is violated. Surprisingly, the MODLR has smaller standard error than other methods in this example when the error is skewed, especially when n = 200 or n = 400. Therefore, for finite sample, the MODLR estimator not only has a good modal explanation but also might have better estimation accuracy than other methods when the error is skewed. In addition, MEDREG and MM-estimate also have smaller standard errors than LSE, due to their robustness. To see why the modal linear regression has better predictive performance than the traditional mean regression, in Table 2, we report the average and standard error of the estimated coverage probabilities when doing prediction based on the same length of intervals centered around each estimate. We consider the lengths of intervals, centered around each estimate, being 0.1σ, 0.2σ, and 0.5σ, where σ = 2 is the approximate standard error of ε. For each of 1,000 replications, we estimate the coverage probability by doing prediction for the 1,000 equally spaced predictors from 0.1 to 0.9. From Table 2, we can see that MODLR provides higher coverage probabilities than all other three methods. Therefore, when the errors are skewed, a small prediction band around the modal line can have larger coverage probabilities than the one around the mean regression line or median regression line with the same length. 15

16 In addition, MEDREG provides larger coverage probabilities than MM-estimate and LSE and MM-estimate provides larger coverage probabilities than LSE. Note that when the length of intervals is large enough, different methods will provide similar coverage probabilities Observation LSE MEDREG MM estimate MODLR 5 y x Figure 1: Scatter plot of a typical sample with n = 200 for Example 1 with different estimated regression lines:. denotes the mean regression line based on LSE; denotes the median regression line; denotes the regression line based on MM-estimate; denotes the modal regression line. 16

17 Table 1: Average (Std) of parameter estimates over 1,000 repetitions. Method Parameter n=50 n=100 n=200 n=400 LSE β 0 = (0.964) 0.989(0.659) 1.007(0.490) 1.009(0.322) β 1 = (2.260) 3.063(1.500) 2.977(1.160) 2.976(0.733) MEDREG β 0 = (0.707) 1.613(0.422) 1.636(0.301) 1.667(0.188) β 1 = (1.670) 4.372(0.981) 4.339(0.705) 4.312(0.457) MM-estimate β 0 = (0.782) 1.040(0.530) 1.022(0.376) 1.035(0.265) β 1 = (1.640) 5.234(1.060) 5.271(0.744) 5.271(0.512) MODLR β 0 = (0.670) 1.841(0.372) 1.875(0.229) 1.912(0.140) β 1 = (1.750) 5.024(0.948) 5.044(0.574) 5.020(0.387) 3.2. The forest fires data The occurrence of forest fires, also called wildfires, is one of the major concerns that affect forest preservation, and create ecological and economical damage. Fast detection is vital for successful firefighting. Since traditional human surveillance and the automatic solutions by satellite-based or infrared/smoke scanners are expensive, it receives much attention to use the low costs meteorological data to warn the public of a fire danger and to support fire management decision. In this section, we illustrate the proposed methodology by an analysis of forest fire data (Cortez and Morais, 2007). This data is available at: pcortez/forestfires/. This forest fire data contains 517 observations and was collected from January 2000 to December 2003 from the Montesinho natural park of the Trás-os-Montes northeast region of Portugal. At a daily basis, every time a forest fire occurred, many features are recorded, such as time, date, spatial location, and weather conditions. Following Cortez and Morais (2007), we will use the four meteorological variables: outside temperature (temp), outside relative humidity (RH), outside wind speed (wind), and outside rain (rain), to predict the total burned area (area). We fit the data by LSE, MEDREG, MM-estimate, and MODLR. One important feature of this data set is that it contains outliers and positive skewed re- 17

18 Table 2: Average (Std) of coverage probabilities over 1,000 repetitions with σ = 2. Width Method n=50 n=100 n=200 n= σ LSE 0.034(0.015) 0.032(0.011) 0.030(0.009) 0.029(0.007) MEDREG 0.073(0.018) 0.077(0.014) 0.078(0.012) 0.080(0.010) MM-estimate 0.065(0.023) 0.067(0.019) 0.066(0.015) 0.067(0.012) MODLR 0.087(0.016) 0.092(0.012) 0.095(0.010) 0.095(0.009) 0.2σ LSE 0.069(0.028) 0.065(0.022) 0.061(0.015) 0.059(0.013) MEDREG 0.144(0.033) 0.153(0.024) 0.155(0.019) 0.158(0.015) MM-estimate 0.129(0.042) 0.133(0.034) 0.132(0.027) 0.134(0.021) MODLR 0.170(0.027) 0.179(0.018) 0.184(0.013) 0.186(0.012) 0.5σ LSE 0.186(0.062) 0.181(0.047) 0.174(0.035) 0.171(0.028) MEDREG 0.338(0.061) 0.355(0.040) 0.360(0.029) 0.365(0.022) MM-estimate 0.313(0.080) 0.322(0.062) 0.325(0.046) 0.330(0.036) MODLR 0.378(0.049) 0.395(0.029) 0.404(0.018) 0.407(0.015) sponse variable (area). Therefore, it is expected that the traditional mean regression based on LSE won t work well. To see why MODLR has better prediction properties than other methods, we compare the widths of the prediction intervals with the same confidence level. Suppose we have obtained the parameter estimate ˆβ and the corresponding residuals ˆɛ 1,..., ˆɛ n from fitting the data; we will use ˆɛ [i] to denote the ith smallest value of the residuals. The traditional method for constructing a prediction interval for new predictor x new is (x T ˆβ new ˆɛ [n1 ], x T ˆβ new + ˆɛ [n2 ]), where n 1 = nα/2, and n 2 = n n 1. This symmetric method will be ideal if the regression error is symmetric. Since MODLR focuses on the highest conditional density region, we propose a method for MODLR to use the information of skewed error density to construct the prediction intervals with (1 α)100% confidence level. Suppose ˆf( ) is the kernel density estimate of ɛ based on the residuals ˆɛ 1,..., ˆɛ n that are estimated by MODLR. We propose to find the indexes k 1 < k 2 such that k 2 k 1 = n 2 n 1 = n(1 α) and ˆf(ˆɛ [k1 ]) ˆf(ˆɛ [k2 ]). Then the proposed prediction interval for new predictor x new is (x T ˆβ new ˆɛ [k1 ], x T ˆβ new + ˆɛ [k2 ]). 18

19 Table 3: Average widths (percentage of coverage) of the prediction intervals Methods Nominal confidence levels 10% 30% 50% 90% LSE 2.166(0.101) 6.687(0.294) 12.70(0.493) 53.03(0.896) MEDREG 0.975(0.091) 2.638(0.292) 6.506(0.491) 48.52(0.894) MM-estimate 1.144(0.099) 2.910(0.294) 6.499(0.497) 48.49(0.906) MODLR 0.012(0.112) 0.035(0.311) 0.571(0.499) 26.44(0.899) We propose the following iterative algorithm to find indexes k 1 and k 2. Let k 1 = n 1 and k 2 = n 2 be the initial values for k 1 and k 2. Step 1: If ˆf(ˆɛ[k1 ]) < ˆf(ˆɛ [k2 ]) and ˆf(ˆɛ [k1 +1]) < ˆf(ˆɛ [k2 +1]), k 1 = k and k 2 = k 2 + 1; if ˆf(ˆɛ [k1 ]) > ˆf(ˆɛ [k2 ]) and ˆf(ˆɛ [k1 1]) > ˆf(ˆɛ [k2 1]), k 1 = k 1 1 and k 2 = k 2 1. Step 2: Iterate the above procedure until none of above two conditions is satisfied or (k 1 1)(k 2 n) = 0. In Table 3, we report the average widths and the actual coverage rates of the prediction intervals for 10%, 30%, 50%, and 90% confidence levels. The actual coverage rates are estimated based on leave-one-out cross validation. From Table 3, we have the following findings: 1. All the prediction intervals are well-calibrated the actual coverage rates are very close to the nominal confidence levels. 2. The average widths of prediction intervals constructed around MODLR are much shorter than the prediction intervals constructed around other three estimates. 3. Both MEDREG and MM-estimate have shorter prediction intervals than LSE. 19

20 4. Summary and Discussions In this article, we have proposed a new powerful data analysis tool, called modal regression, to explore the relationship between a response variable and a set of covariates. The modal regression investigates the relationship using the most likely conditional values instead of the conditional mean used by traditional regression technique. When the error is skewed, the modal regression provides a more meaningful prediction than the LSE. Our simulation results demonstrated that the modal regression has larger coverage probability when doing prediction than some other commonly used regression methods if the same length of short intervals, centered around each estimate, are used. Further research can be conducted to find how to construct the shortest (skewed) prediction interval for a given confidence level using the information of skewed error density. One related work is Kim and Lindsay (2011). They proposed to use confidence distribution sampling to visualize confidence sets. In the application of forest fire data application, we provided one possible way to construct the skewed prediction interval for MODLR. Based on the cross validation results, the proposed skewed prediction intervals for MODLR were much shorter than the prediction intervals constructed by some of other commonly used regression methods for forest fire data. The modal linear regression assumes that the mode of the conditional density of Y given x is a linear function of x. The idea of modal linear regression can be easily generalized to other models such as nonlinear regression, nonparametric regression, and varying coefficient partially linear regression. In addition, it will be also interesting to see how to select the informative variables based on the modal regression idea. These will be our future research work. Appendix The following technical conditions are imposed in this section. 20

21 (A1) g (v) (t x), v = 0, 1, 2, 3 is continuous in a neighborhood of 0, and g (0 x) = 0 for any x. (A2) n 1 n g (0 x i )x i x T i = J + o p (1), n 1 n g (0 x i )x i = K + o p (1), and n 1 n g(0 x i)x i x T i = L + o p (1), where J < 0, i.e., J is a positive definite matrix. (A3) n 1 n x i 4 = O p (1). The conditions above are mild and are fulfilled in many applications. Proof of Theorem 2.1: Note that { } { } log{q h (β (k+1) )} log{q h (β (k) )} = log φ h (y i x T i β (k+1) ) log φ h (y i x T i β (k) ) = log = log = log [ [ [ Based on the Jensen s inequality, we have ] φ h (y i x T i β (k+1) ) n φ h(y i x T i β(k) ) φ h (y i x T i β (k) ) n φ h(y i x T i β(k) ) π(i β (k) ) φ h(y i x T i β (k+1) ) φ h (y i x T i β(k) ) ] φ h (y i x T i β (k+1) ) φ h (y i x T i β(k) ) ] log{q h (β (k+1) )} log{q h (β (k) )} { } π(i β (k) φ h (y i x T i β (k+1) ) ) log. φ h (y i x T i β(k) ) Based on the property of M step in (2.6), we have log{q h (β (k+1) )} log{q h (β (k) )} 0 and thus Q h (β (k+1) ) Q h (β (k) ). 21

22 Proof of Theorem 2.2: Note that φ h(t) = h 3 ( t2 h 2 1)φ(t/h) and φ h(t) = t h 3 φ(t/h). Let a n = (nh 3 ) 1/2 + h 2. It is sufficient to show that for any given η > 0, there exists a large constant c such that P { sup Q h (β 0 + a n µ) < Q h (β 0 )} 1 η. µ =c (A.1) Let X = (x 1,..., x n ) T and y = (y 1,..., y n ) T. Denote K n Q h(β 0 ) β J n 2 Q h (β 0 ) ββ T = 1 n = 1 n φ h(y i x T i β 0 )x i where Q h (β) is defined in (2.3) and β 0 is the true parameter value. By directly calculating the mean and variance, we can get (A.2) φ h(y i x T i β 0 )x i x T i, (A.3) E(J n x) = 1 n g (0 x i )x i x T i {1 + o p (1)} = J{1 + o p (1)}, Var(J n x) = O p {(nh 5 ) 1 }, E(K n x) = h2 g (0 x i )x i (1 + o p (1)) = h2 2n 2 K{1 + o p(1)}, Cov(K n x) = 1 n 2 h ν 3 2 g(0 x i )x i x T i {1 + o(1)} = 1 nh ν 2L{1 + o 3 p (1)}, (A.4) where J = lim n 1 n g (0 x i )x i x T i, K = lim n 1 n g (0 x i )x i, and L = lim n 1 n g(0 x i )x i x T i. By default, when calculating the variance of a matrix, we find the variance of each element of the matrix. Using the result X = E(X) + O p ({Var(X)} 1/2 ), since nh 5, 22

23 J n = J + o p (1). Notice that Q h (β 0 + a n µ) Q h (β 0 ) = a n K T n µ + a2 n 2 µt J n µ a3 n 6nh 4 φ ( y i x T i β )(x T i µ) 3 h = M 1 + M 2 + M 3, (A.5) where u = c and β β 0 ca n. From (A.4), we get K n = O p (a n ) and hence M 1 = O p (a 2 n). Note that M 2 = 0.5a 2 nµ T Jµ{1 + o p (1)}. Based on the boundness of φ (4) (t) and β β 0 ca n, we have φ ( y i x T i β h ) = φ ( y i x T i β 0 )(1 + o p (1)). h Noting that φ (t) = (3t t 3 )φ(t), based on the Taylor expansion and the symmetric property of φ(t), we have that { ( )} { ( )} E φ Yi x T i β 0 = O p (h 4 ), Var φ Yi x T i β 0 = O p (h). (A.6) h h Since nh 5, we can prove that M 3 = o p (a 2 n). For any η > 0, we can choose c big enough, such that the second term M 2 dominates the other two terms in (A.5) with probability 1 η. Since J < 0, Q h (β 0 + a n µ) Q h (β 0 ) < 0 with probability 1 η. The result of Theorem 2.2 follows. Proof of Theorem 2.3: Suppose ˆβ is the consistent solution to Q h (β)/ β found in Theorem 2.2. Based on the Taylor expansion, we have 0 = Q h(ˆβ) β = K n + (J n + L n )(ˆβ β 0 ), (A.7) 23

24 where L n = 1 2nh 4 [ φ ( Y i β T x i ) h ] } {(ˆβ β 0 ) T x i x i x T i, where β β 0 ˆβ β 0. Based on the result of (A.7), we have ˆβ β 0 = (J n +L n ) 1 K n. Since ˆβ β 0 = O p (a n ), where a n = (nh 3 ) 1/2 + h 2, similar to the proof of M 3 in (A.5), we have L n = o p (1). Hence, based on (A.4), we have ˆβ β 0 = J 1 K n (1+o p (1)). Next we prove the asymptotic normality for Kn = nh 3 K n. For any unit vector d R p+1, we prove {d T Cov(K n)d} 1 2 {d T K n d T E(K n)} D N(0, 1) Let ξ i = 1 φ ( Y i x T i β T 0 )d T x i. nh h Then d T K n = n ξ i. We check the Lyapunov s condition. Based on the results (A.4), we know Cov(K n ) = L nh 3 ν 2{1 + o(1)}. (A.8) Hence Var(d T K n) = nh 3 d T Cov(K n )d = g(0)ν 2 d T Ld + o(1). So we only need to prove ne ξ Noticing that (d T x i ) 2 x i 2 d 2 = x i 2, and φ ( ) is bounded, we have ne ξ 1 3 O{(nh 3 ) 1/2 } 0. So, the asymptotic normality for K n holds, i.e., we have nh 3 { K n h 2 K/2(1 + o p (h 2 )) } D N(0, ν 2 L). Based on the Slutsky s theorem, we have nh 3 ] [ˆβ β h2 2 J 1 K{1 + o p (h 2 D )} N { 0, ν 2 J 1 LJ 1}. 24

25 Proof of Theorem 2.4: Since ˆβ s has root n consistency, the asymptotic result of ˆβ 0 is the same as if ˆβ s were known and its asymptotic distribution can be derived from Theorem 2.3 by assuming x = 1 and the independence of ɛ and x, under which we have J 1 K = g (0)/(2g (0)) and J 1 LJ 1 = g (0) 2 g(0). Then the result follows. Proof of Theorem 2.5: Let φ (t) = 2πhφ h (t). Then M = n φ (y i x T i ˆβ), where ˆβ = T (Z). Notice that φ ( ) has a maximum at 0 with φ (0) = 1 and φ ( ) decreases monotonely toward both sides, and that lim φ (t) = 0 for t. We first prove that T (Z Z ) stays bounded if m < M. Let ξ > 0 be such that m + nξ < M, and let C be such that φ (t) ξ for t C. Let β be any real vector such that y x T β C for all z = (x, y) in Z. Then m+n φ (y i x T i T (Z)) M (A.9) and m+n φ (y i x T i β) nξ + m. (A.10) From (A.9) and (A.10), one knows that T (Z Z ) must satisfy y x T T (Z Z ) < C for a point in Z and thus T (Z Z ) is bounded. On the other hand, if m > M, let ξ > 0 such that m mξ > M, and let C be such that φ (t) ξ for t C. Assume that all points {(x n+1, y n+1 ),..., (x n+m, y n+m )} in Z are the same and satisfy a linear relationship y = x T β. Let β be any vector such that y n+1 x T n+1β < C. Then m+n φ (y i x T i β) M + mξ, (A.11) 25

26 and m+n φ (y i x T i β ) m. (A.12) From (A.11) and (A.12), one knows that T (Z Z ) must satisfy y n+1 x T n+1t (Z Z ) C. If we let y n+1 with x n+1 fixed, T (Z Z ) must go off to infinity, and we have breakdown. References Chaudhuri, P. and Marron, J. S. (1999). Sizer for Exploration of Structures in Curves. Journal of the American Statistical Association, 94, Cortez, P. and Morais, A. (2007). A Data Mining Approach to Predict Forest Fires using Meterological Data. Proceedings of the 13th EPIA Portuguese Conference on Artificial Intelligence, December, Guimaraes, Portugal, Davies, P. L. and Kovac, A. (2004). Densities, Spectral Densities and Modality. Annals of Statistics, 32, Donoho, D. L. (1982). Breakdown Properties of Multivariate Location Estimators. Ph.D. Qualifying paper, Department of Statistics, Harvard University. Donoho, D. L. and Huber, P. J. (1983). The Notion of Breakdown Point. In: Doksum, K., Hodges, J. L. (Editors), Festchrift in Honor of Erich Lehman. Wadsworth, Belmont, CA. Eddy, W. F. (1980). Optimum Kernel Estimators of the Mode. Annals of Statistics, 8, Fisher, N. I. and Marron, J. S. (2001). Mode Testing via The Excess Mass Estimate. Biometrika, 88,

27 Friedman J. H. and Fisher, N. I. (1999). Bump Hunting in High-Dimensional Data. Statistics and Computing, 9, Hall, P., Minnotte, M. C., and Zhang, C. (2004). Bump Hunting with Non-Gaussian Kernels. Annals of Statistics, 32, Hampel, F. R. (1971). A General Quantitative Definition of Robustness. Annals of Mathematiccal Statistics, 42, Hampel, F. R. (1974). The Influence Curve and Its Role in Robust Estimation. Journal of the American Statistical Association, 69, Huber, P. J. (1984). Finite Sample Breakdown Points of M- and P-Estimators. Annals of Statistics, 12, Kemp, G. C. R. and Santos Silva, J. M. C. (2010). Regression towards the mode. Department of Economics, University of Essex, Discussion Paper No jmcss/research.html. Kim, D. and Lindsay, B. G. (2011). Using Confidence Distribution Sampling to Visualize Confidence Sets. Statistica Sinica, 21(2), Lee, M. J. (1989). Mode Regression Journal of Econometrics, 42, Li, J., Ray, S., and Lindsay, B. G. (2007). A Nonparametric Statistical Approach to Clustering via Mode Identification. Journal of Machine Learning Research, 8(8), Muller, D. W. and Sawitzki, G. (1991). Excess Mass Estimates and Tests for Multimodality. Journal of the American Statistical Association, 86, Parzen, E. (1962). On Estimation of a Probability Density Function and Mode. The Annals of Mathematical Statistics, 33,

28 Ray, S. and Lindsay, B. G. (2005). The Topography of Multivariate Normal Mixtures. Annals of Statistics, 33, Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. Chapman and Hall: London. Scott, D. W. (1992). Multivariate Density Estimation: Theory, Practice and Visualization. New York: Wiley. Yao, W. and Lindsay, B. G. (2009). Bayesian Mixture Labeling by Highest Posterior Density. Journal of American Statistical Association, 104, Yohai, V. J. (1987). High Breakdown-point and High Efficiency Robust Estimates for Regression. The Annals of Statistics, 15,

Nonparametric Modal Regression

Nonparametric Modal Regression Nonparametric Modal Regression Summary In this article, we propose a new nonparametric modal regression model, which aims to estimate the mode of the conditional density of Y given predictors X. The nonparametric

More information

Local Modal Regression

Local Modal Regression Local Modal Regression Weixin Yao Department of Statistics, Kansas State University, Manhattan, Kansas 66506, U.S.A. wxyao@ksu.edu Bruce G. Lindsay and Runze Li Department of Statistics, The Pennsylvania

More information

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.

More information

KANSAS STATE UNIVERSITY

KANSAS STATE UNIVERSITY ROBUST MIXTURES OF REGRESSION MODELS by XIUQIN BAI M.S., Kansas State University, USA, 2010 AN ABSTRACT OF A DISSERTATION submitted in partial fulfillment of the requirements for the degree DOCTOR OF PHILOSOPHY

More information

Kernel Density Based Linear Regression Estimate

Kernel Density Based Linear Regression Estimate Kernel Density Based Linear Regression Estimate Weixin Yao and Zibiao Zao Abstract For linear regression models wit non-normally distributed errors, te least squares estimate (LSE will lose some efficiency

More information

On the Robust Modal Local Polynomial Regression

On the Robust Modal Local Polynomial Regression International Journal of Statistical Sciences ISSN 683 5603 Vol. 9(Special Issue), 2009, pp 27-23 c 2009 Dept. of Statistics, Univ. of Rajshahi, Bangladesh On the Robust Modal Local Polynomial Regression

More information

Nonparametric and Varying Coefficient Modal. Regression

Nonparametric and Varying Coefficient Modal. Regression Nonparametric and Varying Coefficient Modal arxiv:1602.06609v1 [stat.me] 22 Feb 2016 Regression Weixin Yao and Sijia Xiang Summary In this article, we propose a new nonparametric data analysis tool, which

More information

Nonparametric Methods

Nonparametric Methods Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis

More information

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,

More information

Regression Analysis for Data Containing Outliers and High Leverage Points

Regression Analysis for Data Containing Outliers and High Leverage Points Alabama Journal of Mathematics 39 (2015) ISSN 2373-0404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain

More information

Label Switching and Its Simple Solutions for Frequentist Mixture Models

Label Switching and Its Simple Solutions for Frequentist Mixture Models Label Switching and Its Simple Solutions for Frequentist Mixture Models Weixin Yao Department of Statistics, Kansas State University, Manhattan, Kansas 66506, U.S.A. wxyao@ksu.edu Abstract The label switching

More information

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University The Bootstrap: Theory and Applications Biing-Shen Kuo National Chengchi University Motivation: Poor Asymptotic Approximation Most of statistical inference relies on asymptotic theory. Motivation: Poor

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Lecture 12 Robust Estimation

Lecture 12 Robust Estimation Lecture 12 Robust Estimation Prof. Dr. Svetlozar Rachev Institute for Statistics and Mathematical Economics University of Karlsruhe Financial Econometrics, Summer Semester 2007 Copyright These lecture-notes

More information

KANSAS STATE UNIVERSITY Manhattan, Kansas

KANSAS STATE UNIVERSITY Manhattan, Kansas ROBUST MIXTURE MODELING by CHUN YU M.S., Kansas State University, 2008 AN ABSTRACT OF A DISSERTATION submitted in partial fulfillment of the requirements for the degree Doctor of Philosophy Department

More information

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model 1. Introduction Varying-coefficient partially linear model (Zhang, Lee, and Song, 2002; Xia, Zhang, and Tong, 2004;

More information

Empirical likelihood based modal regression

Empirical likelihood based modal regression Stat Papers (2015) 56:411 430 DOI 10.1007/s00362-014-0588-4 REGULAR ARTICLE Empirical likelihood based modal regression Weihua Zhao Riquan Zhang Yukun Liu Jicai Liu Received: 24 December 2012 / Revised:

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Robust estimation for linear regression with asymmetric errors with applications to log-gamma regression

Robust estimation for linear regression with asymmetric errors with applications to log-gamma regression Robust estimation for linear regression with asymmetric errors with applications to log-gamma regression Ana M. Bianco, Marta García Ben and Víctor J. Yohai Universidad de Buenps Aires October 24, 2003

More information

MULTIVARIATE TECHNIQUES, ROBUSTNESS

MULTIVARIATE TECHNIQUES, ROBUSTNESS MULTIVARIATE TECHNIQUES, ROBUSTNESS Mia Hubert Associate Professor, Department of Mathematics and L-STAT Katholieke Universiteit Leuven, Belgium mia.hubert@wis.kuleuven.be Peter J. Rousseeuw 1 Senior Researcher,

More information

Robust Monte Carlo Methods for Sequential Planning and Decision Making

Robust Monte Carlo Methods for Sequential Planning and Decision Making Robust Monte Carlo Methods for Sequential Planning and Decision Making Sue Zheng, Jason Pacheco, & John Fisher Sensing, Learning, & Inference Group Computer Science & Artificial Intelligence Laboratory

More information

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13)

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13) Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13) 1. Weighted Least Squares (textbook 11.1) Recall regression model Y = β 0 + β 1 X 1 +... + β p 1 X p 1 + ε in matrix form: (Ch. 5,

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

Robust high-dimensional linear regression: A statistical perspective

Robust high-dimensional linear regression: A statistical perspective Robust high-dimensional linear regression: A statistical perspective Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics STOC workshop on robustness and nonconvexity Montreal,

More information

Test for Discontinuities in Nonparametric Regression

Test for Discontinuities in Nonparametric Regression Communications of the Korean Statistical Society Vol. 15, No. 5, 2008, pp. 709 717 Test for Discontinuities in Nonparametric Regression Dongryeon Park 1) Abstract The difference of two one-sided kernel

More information

Diagnostics for Linear Models With Functional Responses

Diagnostics for Linear Models With Functional Responses Diagnostics for Linear Models With Functional Responses Qing Shen Edmunds.com Inc. 2401 Colorado Ave., Suite 250 Santa Monica, CA 90404 (shenqing26@hotmail.com) Hongquan Xu Department of Statistics University

More information

Statistical Data Analysis

Statistical Data Analysis DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

Nonparametric Inference via Bootstrapping the Debiased Estimator

Nonparametric Inference via Bootstrapping the Debiased Estimator Nonparametric Inference via Bootstrapping the Debiased Estimator Yen-Chi Chen Department of Statistics, University of Washington ICSA-Canada Chapter Symposium 2017 1 / 21 Problem Setup Let X 1,, X n be

More information

SEMIPARAMETRIC QUANTILE REGRESSION WITH HIGH-DIMENSIONAL COVARIATES. Liping Zhu, Mian Huang, & Runze Li. The Pennsylvania State University

SEMIPARAMETRIC QUANTILE REGRESSION WITH HIGH-DIMENSIONAL COVARIATES. Liping Zhu, Mian Huang, & Runze Li. The Pennsylvania State University SEMIPARAMETRIC QUANTILE REGRESSION WITH HIGH-DIMENSIONAL COVARIATES Liping Zhu, Mian Huang, & Runze Li The Pennsylvania State University Technical Report Series #10-104 College of Health and Human Development

More information

Inference For High Dimensional M-estimates: Fixed Design Results

Inference For High Dimensional M-estimates: Fixed Design Results Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Computation of an efficient and robust estimator in a semiparametric mixture model

Computation of an efficient and robust estimator in a semiparametric mixture model Journal of Statistical Computation and Simulation ISSN: 0094-9655 (Print) 1563-5163 (Online) Journal homepage: http://www.tandfonline.com/loi/gscs20 Computation of an efficient and robust estimator in

More information

On robust and efficient estimation of the center of. Symmetry.

On robust and efficient estimation of the center of. Symmetry. On robust and efficient estimation of the center of symmetry Howard D. Bondell Department of Statistics, North Carolina State University Raleigh, NC 27695-8203, U.S.A (email: bondell@stat.ncsu.edu) Abstract

More information

Improving linear quantile regression for

Improving linear quantile regression for Improving linear quantile regression for replicated data arxiv:1901.0369v1 [stat.ap] 16 Jan 2019 Kaushik Jana 1 and Debasis Sengupta 2 1 Imperial College London, UK 2 Indian Statistical Institute, Kolkata,

More information

Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting)

Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting) Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting) Professor: Aude Billard Assistants: Nadia Figueroa, Ilaria Lauzana and Brice Platerrier E-mails: aude.billard@epfl.ch,

More information

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Statistica Sinica 19 (2009), 71-81 SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Song Xi Chen 1,2 and Chiu Min Wong 3 1 Iowa State University, 2 Peking University and

More information

COMPARISON OF THE ESTIMATORS OF THE LOCATION AND SCALE PARAMETERS UNDER THE MIXTURE AND OUTLIER MODELS VIA SIMULATION

COMPARISON OF THE ESTIMATORS OF THE LOCATION AND SCALE PARAMETERS UNDER THE MIXTURE AND OUTLIER MODELS VIA SIMULATION (REFEREED RESEARCH) COMPARISON OF THE ESTIMATORS OF THE LOCATION AND SCALE PARAMETERS UNDER THE MIXTURE AND OUTLIER MODELS VIA SIMULATION Hakan S. Sazak 1, *, Hülya Yılmaz 2 1 Ege University, Department

More information

WEIGHTED LIKELIHOOD NEGATIVE BINOMIAL REGRESSION

WEIGHTED LIKELIHOOD NEGATIVE BINOMIAL REGRESSION WEIGHTED LIKELIHOOD NEGATIVE BINOMIAL REGRESSION Michael Amiguet 1, Alfio Marazzi 1, Victor Yohai 2 1 - University of Lausanne, Institute for Social and Preventive Medicine, Lausanne, Switzerland 2 - University

More information

Function of Longitudinal Data

Function of Longitudinal Data New Local Estimation Procedure for Nonparametric Regression Function of Longitudinal Data Weixin Yao and Runze Li Abstract This paper develops a new estimation of nonparametric regression functions for

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

New Local Estimation Procedure for Nonparametric Regression Function of Longitudinal Data

New Local Estimation Procedure for Nonparametric Regression Function of Longitudinal Data ew Local Estimation Procedure for onparametric Regression Function of Longitudinal Data Weixin Yao and Runze Li The Pennsylvania State University Technical Report Series #0-03 College of Health and Human

More information

Linear regression COMS 4771

Linear regression COMS 4771 Linear regression COMS 4771 1. Old Faithful and prediction functions Prediction problem: Old Faithful geyser (Yellowstone) Task: Predict time of next eruption. 1 / 40 Statistical model for time between

More information

ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY

ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY G.L. Shevlyakov, P.O. Smirnov St. Petersburg State Polytechnic University St.Petersburg, RUSSIA E-mail: Georgy.Shevlyakov@gmail.com

More information

Robust Variable Selection Through MAVE

Robust Variable Selection Through MAVE Robust Variable Selection Through MAVE Weixin Yao and Qin Wang Abstract Dimension reduction and variable selection play important roles in high dimensional data analysis. Wang and Yin (2008) proposed sparse

More information

The Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility

The Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility The Slow Convergence of OLS Estimators of α, β and Portfolio Weights under Long Memory Stochastic Volatility New York University Stern School of Business June 21, 2018 Introduction Bivariate long memory

More information

Bayesian spatial quantile regression

Bayesian spatial quantile regression Brian J. Reich and Montserrat Fuentes North Carolina State University and David B. Dunson Duke University E-mail:reich@stat.ncsu.edu Tropospheric ozone Tropospheric ozone has been linked with several adverse

More information

Discussion Paper No. 28

Discussion Paper No. 28 Discussion Paper No. 28 Asymptotic Property of Wrapped Cauchy Kernel Density Estimation on the Circle Yasuhito Tsuruta Masahiko Sagae Asymptotic Property of Wrapped Cauchy Kernel Density Estimation on

More information

GARCH Models Estimation and Inference

GARCH Models Estimation and Inference GARCH Models Estimation and Inference Eduardo Rossi University of Pavia December 013 Rossi GARCH Financial Econometrics - 013 1 / 1 Likelihood function The procedure most often used in estimating θ 0 in

More information

Inequalities Relating Addition and Replacement Type Finite Sample Breakdown Points

Inequalities Relating Addition and Replacement Type Finite Sample Breakdown Points Inequalities Relating Addition and Replacement Type Finite Sample Breadown Points Robert Serfling Department of Mathematical Sciences University of Texas at Dallas Richardson, Texas 75083-0688, USA Email:

More information

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data Song Xi CHEN Guanghua School of Management and Center for Statistical Science, Peking University Department

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University JSM, 2015 E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015

More information

Local Polynomial Wavelet Regression with Missing at Random

Local Polynomial Wavelet Regression with Missing at Random Applied Mathematical Sciences, Vol. 6, 2012, no. 57, 2805-2819 Local Polynomial Wavelet Regression with Missing at Random Alsaidi M. Altaher School of Mathematical Sciences Universiti Sains Malaysia 11800

More information

On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness

On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness Statistics and Applications {ISSN 2452-7395 (online)} Volume 16 No. 1, 2018 (New Series), pp 289-303 On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness Snigdhansu

More information

Does k-th Moment Exist?

Does k-th Moment Exist? Does k-th Moment Exist? Hitomi, K. 1 and Y. Nishiyama 2 1 Kyoto Institute of Technology, Japan 2 Institute of Economic Research, Kyoto University, Japan Email: hitomi@kit.ac.jp Keywords: Existence of moments,

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Inference on distributions and quantiles using a finite-sample Dirichlet process

Inference on distributions and quantiles using a finite-sample Dirichlet process Dirichlet IDEAL Theory/methods Simulations Inference on distributions and quantiles using a finite-sample Dirichlet process David M. Kaplan University of Missouri Matt Goldman UC San Diego Midwest Econometrics

More information

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Vadim Marmer University of British Columbia Artyom Shneyerov CIRANO, CIREQ, and Concordia University August 30, 2010 Abstract

More information

The EM Algorithm for the Finite Mixture of Exponential Distribution Models

The EM Algorithm for the Finite Mixture of Exponential Distribution Models Int. J. Contemp. Math. Sciences, Vol. 9, 2014, no. 2, 57-64 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ijcms.2014.312133 The EM Algorithm for the Finite Mixture of Exponential Distribution

More information

Smooth simultaneous confidence bands for cumulative distribution functions

Smooth simultaneous confidence bands for cumulative distribution functions Journal of Nonparametric Statistics, 2013 Vol. 25, No. 2, 395 407, http://dx.doi.org/10.1080/10485252.2012.759219 Smooth simultaneous confidence bands for cumulative distribution functions Jiangyan Wang

More information

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Put your solution to each problem on a separate sheet of paper. Problem 1. (5166) Assume that two random samples {x i } and {y i } are independently

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Local linear multiple regression with variable. bandwidth in the presence of heteroscedasticity

Local linear multiple regression with variable. bandwidth in the presence of heteroscedasticity Local linear multiple regression with variable bandwidth in the presence of heteroscedasticity Azhong Ye 1 Rob J Hyndman 2 Zinai Li 3 23 January 2006 Abstract: We present local linear estimator with variable

More information

Nonparametric Heteroscedastic Transformation Regression Models for Skewed Data, with an Application to Health Care Costs

Nonparametric Heteroscedastic Transformation Regression Models for Skewed Data, with an Application to Health Care Costs Nonparametric Heteroscedastic Transformation Regression Models for Skewed Data, with an Application to Health Care Costs Xiao-Hua Zhou, Huazhen Lin, Eric Johnson Journal of Royal Statistical Society Series

More information

Quantile Regression for Extraordinarily Large Data

Quantile Regression for Extraordinarily Large Data Quantile Regression for Extraordinarily Large Data Shih-Kang Chao Department of Statistics Purdue University November, 2016 A joint work with Stanislav Volgushev and Guang Cheng Quantile regression Two-step

More information

Spatial inference. Spatial inference. Accounting for spatial correlation. Multivariate normal distributions

Spatial inference. Spatial inference. Accounting for spatial correlation. Multivariate normal distributions Spatial inference I will start with a simple model, using species diversity data Strong spatial dependence, Î = 0.79 what is the mean diversity? How precise is our estimate? Sampling discussion: The 64

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

Long-Run Covariability

Long-Run Covariability Long-Run Covariability Ulrich K. Müller and Mark W. Watson Princeton University October 2016 Motivation Study the long-run covariability/relationship between economic variables great ratios, long-run Phillips

More information

Nonparametric Econometrics

Nonparametric Econometrics Applied Microeconometrics with Stata Nonparametric Econometrics Spring Term 2011 1 / 37 Contents Introduction The histogram estimator The kernel density estimator Nonparametric regression estimators Semi-

More information

University, Tempe, Arizona, USA b Department of Mathematics and Statistics, University of New. Mexico, Albuquerque, New Mexico, USA

University, Tempe, Arizona, USA b Department of Mathematics and Statistics, University of New. Mexico, Albuquerque, New Mexico, USA This article was downloaded by: [University of New Mexico] On: 27 September 2012, At: 22:13 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered

More information

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions Journal of Modern Applied Statistical Methods Volume 8 Issue 1 Article 13 5-1-2009 Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error

More information

Quantile regression and heteroskedasticity

Quantile regression and heteroskedasticity Quantile regression and heteroskedasticity José A. F. Machado J.M.C. Santos Silva June 18, 2013 Abstract This note introduces a wrapper for qreg which reports standard errors and t statistics that are

More information

Lecture 5: Clustering, Linear Regression

Lecture 5: Clustering, Linear Regression Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-3.2 STATS 202: Data mining and analysis October 4, 2017 1 / 22 .0.0 5 5 1.0 7 5 X2 X2 7 1.5 1.0 0.5 3 1 2 Hierarchical clustering

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference

Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference 1 / 171 Bootstrap inference Francisco Cribari-Neto Departamento de Estatística Universidade Federal de Pernambuco Recife / PE, Brazil email: cribari@gmail.com October 2013 2 / 171 Unpaid advertisement

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

A Comparison of Robust Estimators Based on Two Types of Trimming

A Comparison of Robust Estimators Based on Two Types of Trimming Submitted to the Bernoulli A Comparison of Robust Estimators Based on Two Types of Trimming SUBHRA SANKAR DHAR 1, and PROBAL CHAUDHURI 1, 1 Theoretical Statistics and Mathematics Unit, Indian Statistical

More information

Uncertainty Quantification for Inverse Problems. November 7, 2011

Uncertainty Quantification for Inverse Problems. November 7, 2011 Uncertainty Quantification for Inverse Problems November 7, 2011 Outline UQ and inverse problems Review: least-squares Review: Gaussian Bayesian linear model Parametric reductions for IP Bias, variance

More information

Finite Population Sampling and Inference

Finite Population Sampling and Inference Finite Population Sampling and Inference A Prediction Approach RICHARD VALLIANT ALAN H. DORFMAN RICHARD M. ROYALL A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane

More information

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology

More information

Label switching and its solutions for frequentist mixture models

Label switching and its solutions for frequentist mixture models Journal of Statistical Computation and Simulation ISSN: 0094-9655 (Print) 1563-5163 (Online) Journal homepage: http://www.tandfonline.com/loi/gscs20 Label switching and its solutions for frequentist mixture

More information

Neural Network Approach to Estimating Conditional Quantile Polynomial Distributed Lag (QPDL) Model with an Application to Rubber Price Returns

Neural Network Approach to Estimating Conditional Quantile Polynomial Distributed Lag (QPDL) Model with an Application to Rubber Price Returns American Journal of Business, Economics and Management 2015; 3(3): 162-170 Published online June 10, 2015 (http://www.openscienceonline.com/journal/ajbem) Neural Network Approach to Estimating Conditional

More information

REGRESSION WITH SPATIALLY MISALIGNED DATA. Lisa Madsen Oregon State University David Ruppert Cornell University

REGRESSION WITH SPATIALLY MISALIGNED DATA. Lisa Madsen Oregon State University David Ruppert Cornell University REGRESSION ITH SPATIALL MISALIGNED DATA Lisa Madsen Oregon State University David Ruppert Cornell University SPATIALL MISALIGNED DATA 10 X X X X X X X X 5 X X X X X 0 X 0 5 10 OUTLINE 1. Introduction 2.

More information

Economics 583: Econometric Theory I A Primer on Asymptotics

Economics 583: Econometric Theory I A Primer on Asymptotics Economics 583: Econometric Theory I A Primer on Asymptotics Eric Zivot January 14, 2013 The two main concepts in asymptotic theory that we will use are Consistency Asymptotic Normality Intuition consistency:

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression Reading: Hoff Chapter 9 November 4, 2009 Problem Data: Observe pairs (Y i,x i ),i = 1,... n Response or dependent variable Y Predictor or independent variable X GOALS: Exploring

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 Discriminative vs Generative Models Discriminative: Just learn a decision boundary between your

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis

Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis Sílvia Gonçalves and Benoit Perron Département de sciences économiques,

More information

Indian Statistical Institute

Indian Statistical Institute Indian Statistical Institute Introductory Computer programming Robust Regression methods with high breakdown point Author: Roll No: MD1701 February 24, 2018 Contents 1 Introduction 2 2 Criteria for evaluating

More information

Small Sample Corrections for LTS and MCD

Small Sample Corrections for LTS and MCD myjournal manuscript No. (will be inserted by the editor) Small Sample Corrections for LTS and MCD G. Pison, S. Van Aelst, and G. Willems Department of Mathematics and Computer Science, Universitaire Instelling

More information

BTRY 4090: Spring 2009 Theory of Statistics

BTRY 4090: Spring 2009 Theory of Statistics BTRY 4090: Spring 2009 Theory of Statistics Guozhang Wang September 25, 2010 1 Review of Probability We begin with a real example of using probability to solve computationally intensive (or infeasible)

More information

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix Labor-Supply Shifts and Economic Fluctuations Technical Appendix Yongsung Chang Department of Economics University of Pennsylvania Frank Schorfheide Department of Economics University of Pennsylvania January

More information

A nonparametric method of multi-step ahead forecasting in diffusion processes

A nonparametric method of multi-step ahead forecasting in diffusion processes A nonparametric method of multi-step ahead forecasting in diffusion processes Mariko Yamamura a, Isao Shoji b a School of Pharmacy, Kitasato University, Minato-ku, Tokyo, 108-8641, Japan. b Graduate School

More information

AFT Models and Empirical Likelihood

AFT Models and Empirical Likelihood AFT Models and Empirical Likelihood Mai Zhou Department of Statistics, University of Kentucky Collaborators: Gang Li (UCLA); A. Bathke; M. Kim (Kentucky) Accelerated Failure Time (AFT) models: Y = log(t

More information

Issues on quantile autoregression

Issues on quantile autoregression Issues on quantile autoregression Jianqing Fan and Yingying Fan We congratulate Koenker and Xiao on their interesting and important contribution to the quantile autoregression (QAR). The paper provides

More information

Regression Graphics. 1 Introduction. 2 The Central Subspace. R. D. Cook Department of Applied Statistics University of Minnesota St.

Regression Graphics. 1 Introduction. 2 The Central Subspace. R. D. Cook Department of Applied Statistics University of Minnesota St. Regression Graphics R. D. Cook Department of Applied Statistics University of Minnesota St. Paul, MN 55108 Abstract This article, which is based on an Interface tutorial, presents an overview of regression

More information

Lecture 5: Clustering, Linear Regression

Lecture 5: Clustering, Linear Regression Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-3.2 STATS 202: Data mining and analysis October 4, 2017 1 / 22 Hierarchical clustering Most algorithms for hierarchical clustering

More information

ESTIMATION OF NONLINEAR BERKSON-TYPE MEASUREMENT ERROR MODELS

ESTIMATION OF NONLINEAR BERKSON-TYPE MEASUREMENT ERROR MODELS Statistica Sinica 13(2003), 1201-1210 ESTIMATION OF NONLINEAR BERKSON-TYPE MEASUREMENT ERROR MODELS Liqun Wang University of Manitoba Abstract: This paper studies a minimum distance moment estimator for

More information

Empirical Likelihood Methods for Sample Survey Data: An Overview

Empirical Likelihood Methods for Sample Survey Data: An Overview AUSTRIAN JOURNAL OF STATISTICS Volume 35 (2006), Number 2&3, 191 196 Empirical Likelihood Methods for Sample Survey Data: An Overview J. N. K. Rao Carleton University, Ottawa, Canada Abstract: The use

More information