Model-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego

Size: px
Start display at page:

Download "Model-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego"

Transcription

1 Model-free prediction intervals for regression and autoregression Dimitris N. Politis University of California, San Diego

2 To explain or to predict? Models are indispensable for exploring/utilizing relationships between variables: explaining the world.

3 To explain or to predict? Models are indispensable for exploring/utilizing relationships between variables: explaining the world. Use of models for prediction can be problematic when:

4 To explain or to predict? Models are indispensable for exploring/utilizing relationships between variables: explaining the world. Use of models for prediction can be problematic when: a model is overspecified parameter inference is highly model-specific (and sensitive to model mis-specification) prediction is carried out by plugging in the estimated parameters and treating the model as exactly true.

5 To explain or to predict? Models are indispensable for exploring/utilizing relationships between variables: explaining the world. Use of models for prediction can be problematic when: a model is overspecified parameter inference is highly model-specific (and sensitive to model mis-specification) prediction is carried out by plugging in the estimated parameters and treating the model as exactly true. All models are wrong but some are useful George Box.

6 A Toy Example Assume regression model: Y = β 0 + β 1 X + β 2 X 20 + error

7 A Toy Example Assume regression model: Y = β 0 + β 1 X + β 2 X 20 + error If ˆβ 2 is barely statistically significant, do you still use it in prediction?

8 A Toy Example Assume regression model: Y = β 0 + β 1 X + β 2 X 20 + error If ˆβ 2 is barely statistically significant, do you still use it in prediction? If the true value of β 2 is close to zero, and var( ˆβ 2 ) is large, then it may be advantageous to omit β 2 : allow a nonzero Bias but minimize MSE.

9 A Toy Example Assume regression model: Y = β 0 + β 1 X + β 2 X 20 + error If ˆβ 2 is barely statistically significant, do you still use it in prediction? If the true value of β 2 is close to zero, and var( ˆβ 2 ) is large, then it may be advantageous to omit β 2 : allow a nonzero Bias but minimize MSE. A mis-specified model can be optimal for prediction!

10 Prediction Framework a. Point predictors b. Interval predictors c. Predictive distribution

11 Prediction Framework a. Point predictors b. Interval predictors c. Predictive distribution Abundant Bayesian literature in parametric framework Cox (1975), Geisser (1993), etc.

12 Prediction Framework a. Point predictors b. Interval predictors c. Predictive distribution Abundant Bayesian literature in parametric framework Cox (1975), Geisser (1993), etc. Frequentist and/or nonparametric literature scarse.

13 I.i.d. set-up Let ε 1,..., ε n i.i.d. from the (unknown) cdf F ε

14 I.i.d. set-up Let ε 1,..., ε n i.i.d. from the (unknown) cdf F ε GOAL: prediction of future ε n+1 based on the data

15 I.i.d. set-up Let ε 1,..., ε n i.i.d. from the (unknown) cdf F ε GOAL: prediction of future ε n+1 based on the data F ε is the predictive distribution, and its quantiles could be used to form predictive intervals

16 I.i.d. set-up Let ε 1,..., ε n i.i.d. from the (unknown) cdf F ε GOAL: prediction of future ε n+1 based on the data F ε is the predictive distribution, and its quantiles could be used to form predictive intervals The mean and median of F ε are optimal point predictors under an L 2 and L 1 criterion respectively.

17 I.i.d. data F ε is unknown but can be estimated by the empirical distribution (edf) ˆF ε.

18 I.i.d. data F ε is unknown but can be estimated by the empirical distribution (edf) ˆF ε. Practical model-free predictive intervals will be based on quantiles of ˆF ε, and the L 2 and L 1 optimal predictors will be approximated by the mean and median of ˆF ε respectively.

19 Non-i.i.d. data In general, data Y n = (Y 1,..., Y n ) are not i.i.d.

20 Non-i.i.d. data In general, data Y n = (Y 1,..., Y n ) are not i.i.d. So the predictive distribution of Y n+1 given the data will depend on Y n and X n+1 which is a matrix of observable, explanatory (predictor) variables.

21 Non-i.i.d. data In general, data Y n = (Y 1,..., Y n ) are not i.i.d. So the predictive distribution of Y n+1 given the data will depend on Y n and X n+1 which is a matrix of observable, explanatory (predictor) variables. Key Examples: Regression and Time series

22 Models Regression: Y t = µ(x t ) + σ(x t ) ε t with ε t i.i.d. (0,1)

23 Models Regression: Y t = µ(x t ) + σ(x t ) ε t with ε t i.i.d. (0,1) Time series: Y t = µ(y t 1,, Y t p ; x t ) + σ(y t 1,, Y t p ; x t ) ε t

24 Models Regression: Y t = µ(x t ) + σ(x t ) ε t with ε t i.i.d. (0,1) Time series: Y t = µ(y t 1,, Y t p ; x t ) + σ(y t 1,, Y t p ; x t ) ε t The above are flexible, nonparametric models.

25 Models Regression: Y t = µ(x t ) + σ(x t ) ε t with ε t i.i.d. (0,1) Time series: Y t = µ(y t 1,, Y t p ; x t ) + σ(y t 1,, Y t p ; x t ) ε t The above are flexible, nonparametric models. Given one of the above models, optimal model-based predictors of a future Y -value can be constructed.

26 Models Regression: Y t = µ(x t ) + σ(x t ) ε t with ε t i.i.d. (0,1) Time series: Y t = µ(y t 1,, Y t p ; x t ) + σ(y t 1,, Y t p ; x t ) ε t The above are flexible, nonparametric models. Given one of the above models, optimal model-based predictors of a future Y -value can be constructed. Nevertheless, the prediction problem can be carried out in a fully model-free setting, offering at the very least robustness against model mis-specification.

27 Transformation vs. modeling DATA: Y n = (Y 1,..., Y n )

28 Transformation vs. modeling DATA: Y n = (Y 1,..., Y n ) GOAL: predict future value Y n+1 given the data

29 Transformation vs. modeling DATA: Y n = (Y 1,..., Y n ) GOAL: predict future value Y n+1 given the data Find invertible transformation H m so that (for all m) the vector ɛ m = H m (Y m ) has i.i.d. components ɛ k where ɛ m = (ɛ 1,..., ɛ m )

30 Transformation vs. modeling DATA: Y n = (Y 1,..., Y n ) GOAL: predict future value Y n+1 given the data Find invertible transformation H m so that (for all m) the vector ɛ m = H m (Y m ) has i.i.d. components ɛ k where Y H m ɛ ɛ m = (ɛ 1,..., ɛ m ) Y H 1 m ɛ

31 Transformation (i) (Y 1,..., Y m ) H m (ɛ 1,..., ɛ m ) (ii) (Y 1,..., Y m ) H 1 m (ɛ 1,..., ɛ m ) (i) implies that ɛ 1,..., ɛ n are known given the data Y 1,..., Y n

32 Transformation (i) (Y 1,..., Y m ) H m (ɛ 1,..., ɛ m ) (ii) (Y 1,..., Y m ) H 1 m (ɛ 1,..., ɛ m ) (i) implies that ɛ 1,..., ɛ n are known given the data Y 1,..., Y n (ii) implies that Y n+1 is a function of ɛ 1,..., ɛ n, and ɛ n+1

33 Transformation (i) (Y 1,..., Y m ) H m (ɛ 1,..., ɛ m ) (ii) (Y 1,..., Y m ) H 1 m (ɛ 1,..., ɛ m ) (i) implies that ɛ 1,..., ɛ n are known given the data Y 1,..., Y n (ii) implies that Y n+1 is a function of ɛ 1,..., ɛ n, and ɛ n+1 So, given the data Y n, Y n+1 is a function of ɛ n+1 only, i.e., Y n+1 = h(ɛ n+1 )

34 Model-free prediction principle Y n+1 = h(ɛ n+1 ) Suppose ɛ 1,..., ɛ n cdf F ε

35 Model-free prediction principle Y n+1 = h(ɛ n+1 ) Suppose ɛ 1,..., ɛ n cdf F ε The mean and median of h(ɛ) where ɛ F ε are optimal point predictors of Y n+1 under L 2 or L 1 criterion

36 Model-free prediction principle Y n+1 = h(ɛ n+1 ) Suppose ɛ 1,..., ɛ n cdf F ε The mean and median of h(ɛ) where ɛ F ε are optimal point predictors of Y n+1 under L 2 or L 1 criterion The whole predictive distribution of Y n+1 is the distribution of h(ɛ) when ɛ F ε

37 Model-free prediction principle Y n+1 = h(ɛ n+1 ) Suppose ɛ 1,..., ɛ n cdf F ε The mean and median of h(ɛ) where ɛ F ε are optimal point predictors of Y n+1 under L 2 or L 1 criterion The whole predictive distribution of Y n+1 is the distribution of h(ɛ) when ɛ F ε To predict Y 2 n+1, replace h by h 2 ; to predict g(y n+1 ), replace h by g h.

38 Model-free prediction principle Y n+1 = h(ɛ n+1 ) Suppose ɛ 1,..., ɛ n cdf F ε The mean and median of h(ɛ) where ɛ F ε are optimal point predictors of Y n+1 under L 2 or L 1 criterion The whole predictive distribution of Y n+1 is the distribution of h(ɛ) when ɛ F ε To predict Y 2 n+1, replace h by h 2 ; to predict g(y n+1 ), replace h by g h. The unknown F ε can be estimated by ˆF ε, the edf of ɛ 1,..., ɛ n.

39 Model-free prediction principle Y n+1 = h(ɛ n+1 ) Suppose ɛ 1,..., ɛ n cdf F ε The mean and median of h(ɛ) where ɛ F ε are optimal point predictors of Y n+1 under L 2 or L 1 criterion The whole predictive distribution of Y n+1 is the distribution of h(ɛ) when ɛ F ε To predict Y 2 n+1, replace h by h 2 ; to predict g(y n+1 ), replace h by g h. The unknown F ε can be estimated by ˆF ε, the edf of ɛ 1,..., ɛ n. But the predictive distribution needs bootstrapping also because h is estimated from the data.

40 Nonparametric Regression MODEL ( ) : Y t = µ(x t ) + σ(x t ) ε t x t univariate and deterministic

41 Nonparametric Regression MODEL ( ) : Y t = µ(x t ) + σ(x t ) ε t x t univariate and deterministic Y t data available for t = 1,..., n.

42 Nonparametric Regression MODEL ( ) : Y t = µ(x t ) + σ(x t ) ε t x t univariate and deterministic Y t data available for t = 1,..., n. ε t i.i.d. (0,1) from (unknown) cdf F

43 Nonparametric Regression MODEL ( ) : Y t = µ(x t ) + σ(x t ) ε t x t univariate and deterministic Y t data available for t = 1,..., n. ε t i.i.d. (0,1) from (unknown) cdf F the functions µ( ) and σ( ) unknown but smooth

44 Nonparametric Regression Note: µ(x) = E(Y x) and σ 2 (x) = Var(Y x). Let m x, s x be smoothing estimators of µ(x), σ(x).

45 Nonparametric Regression Note: µ(x) = E(Y x) and σ 2 (x) = Var(Y x). Let m x, s x be smoothing estimators of µ(x), σ(x). Examples: kernel smoothers, local linear fitting, wavelets, etc.

46 Nonparametric Regression Note: µ(x) = E(Y x) and σ 2 (x) = Var(Y x). Let m x, s x be smoothing estimators of µ(x), σ(x). Examples: kernel smoothers, local linear fitting, wavelets, etc. E.g. Nadaraya-Watson estimator m x = n i=1 Y i K ( ) x x i h

47 Nonparametric Regression Note: µ(x) = E(Y x) and σ 2 (x) = Var(Y x). Let m x, s x be smoothing estimators of µ(x), σ(x). Examples: kernel smoothers, local linear fitting, wavelets, etc. E.g. Nadaraya-Watson estimator m x = n i=1 Y i K ( ) x x i h here K(x) is the kernel, h the bandwidth, and K ( ) ( x x i h = K x xi ) h / n k=1 K ( ) x x k h.

48 Nonparametric Regression Note: µ(x) = E(Y x) and σ 2 (x) = Var(Y x). Let m x, s x be smoothing estimators of µ(x), σ(x). Examples: kernel smoothers, local linear fitting, wavelets, etc. E.g. Nadaraya-Watson estimator m x = n i=1 Y i K ( ) x x i h here K(x) is the kernel, h the bandwidth, and K ( ) ( x x i h = K x xi ) h / n k=1 K ( ) x x k h. Similarly, sx 2 = M x mx 2 where M x = ( ) n i=1 Y i 2 K x xi q

49 (a) (b) log wage residual age age (a) Log-wage vs. age data with fitted kernel smoother m x. (b) Unstudentized residuals Y m x with superimposed s x Canadian Census data cps71 from np package of R; wage vs. age dataset of 205 male individuals with common education. Kernel smoother problematic at the left boundary; local linear is better (Fan and Gijbels, 1996) or reflection (Hall and Wehrly, 1991).

50 Residuals ( ): Y t = µ(x t ) + σ(x t ) ε t

51 Residuals ( ): Y t = µ(x t ) + σ(x t ) ε t fitted residuals: e t = (Y t m xt )/s xt

52 Residuals ( ): Y t = µ(x t ) + σ(x t ) ε t fitted residuals: e t = (Y t m xt )/s xt predictive residuals: ẽ t = (Y t m x (t) t )/s x (t) t

53 Residuals ( ): Y t = µ(x t ) + σ(x t ) ε t fitted residuals: e t = (Y t m xt )/s xt predictive residuals: ẽ t = (Y t m x (t) t )/s x (t) t and s x (t) t delete-y t dataset: {(Y i, x i ), for all i t}. m (t) x are the estimators m and s computed from the

54 Residuals ( ): Y t = µ(x t ) + σ(x t ) ε t fitted residuals: e t = (Y t m xt )/s xt predictive residuals: ẽ t = (Y t m x (t) t )/s x (t) t and s x (t) t delete-y t dataset: {(Y i, x i ), for all i t}. m (t) x are the estimators m and s computed from the ẽ t is the (standardized) error in trying to predict Y t from the delete-y t dataset.

55 Residuals ( ): Y t = µ(x t ) + σ(x t ) ε t fitted residuals: e t = (Y t m xt )/s xt predictive residuals: ẽ t = (Y t m x (t) t )/s x (t) t and s x (t) t delete-y t dataset: {(Y i, x i ), for all i t}. m (t) x are the estimators m and s computed from the ẽ t is the (standardized) error in trying to predict Y t from the delete-y t dataset. Selection of bandwidth parameters h and q is often done by cross-validation, i.e., pick h, q to minimize PRESS= n t=1 ẽ2 t.

56 Residuals ( ): Y t = µ(x t ) + σ(x t ) ε t fitted residuals: e t = (Y t m xt )/s xt predictive residuals: ẽ t = (Y t m x (t) t )/s x (t) t and s x (t) t delete-y t dataset: {(Y i, x i ), for all i t}. m (t) x are the estimators m and s computed from the ẽ t is the (standardized) error in trying to predict Y t from the delete-y t dataset. Selection of bandwidth parameters h and q is often done by cross-validation, i.e., pick h, q to minimize PRESS= n t=1 ẽ2 t. BETTER: L 1 cross-validation: pick h, q to minimize n t=1 ẽ t.

57 Model-based (MB) point predictors ( ) Y t = µ(x t ) + σ(x t ) ε t with ε t i.i.d. (0, 1) with cdf F. GOAL: Predict a future response Y f associated with point x f.

58 Model-based (MB) point predictors ( ) Y t = µ(x t ) + σ(x t ) ε t with ε t i.i.d. (0, 1) with cdf F. GOAL: Predict a future response Y f associated with point x f. L 2 optimal predictor of Y f is E(Y f x f ), i.e., µ(x f )

59 Model-based (MB) point predictors ( ) Y t = µ(x t ) + σ(x t ) ε t with ε t i.i.d. (0, 1) with cdf F. GOAL: Predict a future response Y f associated with point x f. L 2 optimal predictor of Y f is E(Y f x f ), i.e., µ(x f ) which is approximated by m xf.

60 Model-based (MB) point predictors ( ) Y t = µ(x t ) + σ(x t ) ε t with ε t i.i.d. (0, 1) with cdf F. GOAL: Predict a future response Y f associated with point x f. L 2 optimal predictor of Y f is E(Y f x f ), i.e., µ(x f ) which is approximated by m xf. L 1 optimal predictor of Y f is the conditional median, i.e., µ(x f ) + σ(x f ) median(f )

61 Model-based (MB) point predictors ( ) Y t = µ(x t ) + σ(x t ) ε t with ε t i.i.d. (0, 1) with cdf F. GOAL: Predict a future response Y f associated with point x f. L 2 optimal predictor of Y f is E(Y f x f ), i.e., µ(x f ) which is approximated by m xf. L 1 optimal predictor of Y f is the conditional median, i.e., µ(x f ) + σ(x f ) median(f ) which is approximated by m xf + s xf median(ˆf e ) where ˆF e is the edf of the residuals e 1,..., e n

62 Model-based (MB) point predictors ( ) Y t = µ(x t ) + σ(x t ) ε t with ε t i.i.d. (0, 1) with cdf F. DATASET cps71: salaries are logarithmically transformed, i.e., Y t = log-salary.

63 Model-based (MB) point predictors ( ) Y t = µ(x t ) + σ(x t ) ε t with ε t i.i.d. (0, 1) with cdf F. DATASET cps71: salaries are logarithmically transformed, i.e., Y t = log-salary. To predict salary at age x f we need to predict g(y f ) where g(x) = exp(x).

64 Model-based (MB) point predictors ( ) Y t = µ(x t ) + σ(x t ) ε t with ε t i.i.d. (0, 1) with cdf F. DATASET cps71: salaries are logarithmically transformed, i.e., Y t = log-salary. To predict salary at age x f we need to predict g(y f ) where g(x) = exp(x). MB L 2 optimal predictor of g(y f ) is E(g(Y f ) x f ) estimated by n 1 n i=1 g (m x f + σ xf e i ).

65 Model-based (MB) point predictors ( ) Y t = µ(x t ) + σ(x t ) ε t with ε t i.i.d. (0, 1) with cdf F. DATASET cps71: salaries are logarithmically transformed, i.e., Y t = log-salary. To predict salary at age x f we need to predict g(y f ) where g(x) = exp(x). MB L 2 optimal predictor of g(y f ) is E(g(Y f ) x f ) estimated by n 1 n i=1 g (m x f + σ xf e i ). Naive predictor g(m xf ) grossly suboptimal when g is nonlinear. MB L 1 optimal predictor of g(y f ) estimated by the sample median of the set {g (m xf + σ xf e i ), i = 1,..., n}; naive plug-in ok iff g is monotone!

66 Resampling Algorithm for predictive distribution of g(y f ) Prediction root: g(y f ) Π where Π is the point predictor. Bootstrap the (fitted or predictive) residuals r 1,..., r n to create pseudo-residuals r 1,..., r n whose edf is denoted by ˆF n.

67 Resampling Algorithm for predictive distribution of g(y f ) Prediction root: g(y f ) Π where Π is the point predictor. Bootstrap the (fitted or predictive) residuals r 1,..., r n to create pseudo-residuals r 1,..., r n whose edf is denoted by ˆF n. Create pseudo-data Y i = m xi + s xi ri, for i = 1,...n.

68 Resampling Algorithm for predictive distribution of g(y f ) Prediction root: g(y f ) Π where Π is the point predictor. Bootstrap the (fitted or predictive) residuals r 1,..., r n to create pseudo-residuals r 1,..., r n whose edf is denoted by ˆF n. Create pseudo-data Yi = m xi + s xi ri, for i = 1,...n. Calculate a bootstrap pseudo-response Yf = m xf + s xf r where r is drawn randomly from (r 1,..., r n ).

69 Resampling Algorithm for predictive distribution of g(y f ) Prediction root: g(y f ) Π where Π is the point predictor. Bootstrap the (fitted or predictive) residuals r 1,..., r n to create pseudo-residuals r 1,..., r n whose edf is denoted by ˆF n. Create pseudo-data Yi = m xi + s xi ri, for i = 1,...n. Calculate a bootstrap pseudo-response Yf = m xf + s xf r where r is drawn randomly from (r 1,..., r n ). Based on the pseudo-data Y 1,..., Y n, re-estimate the functions µ(x) and σ(x) by m x and s x.

70 Resampling Algorithm for predictive distribution of g(y f ) Prediction root: g(y f ) Π where Π is the point predictor. Bootstrap the (fitted or predictive) residuals r 1,..., r n to create pseudo-residuals r 1,..., r n whose edf is denoted by ˆF n. Create pseudo-data Yi = m xi + s xi ri, for i = 1,...n. Calculate a bootstrap pseudo-response Yf = m xf + s xf r where r is drawn randomly from (r 1,..., r n ). Based on the pseudo-data Y 1,..., Y n, re-estimate the functions µ(x) and σ(x) by m x and s x. Calculate bootstrap root: g(y f ) Π(g, m x, s x, Y n, X n+1, ˆF n ).

71 Resampling Algorithm for predictive distribution of g(y f ) Prediction root: g(y f ) Π where Π is the point predictor. Bootstrap the (fitted or predictive) residuals r 1,..., r n to create pseudo-residuals r 1,..., r n whose edf is denoted by ˆF n. Create pseudo-data Yi = m xi + s xi ri, for i = 1,...n. Calculate a bootstrap pseudo-response Yf = m xf + s xf r where r is drawn randomly from (r 1,..., r n ). Based on the pseudo-data Y 1,..., Y n, re-estimate the functions µ(x) and σ(x) by m x and s x. Calculate bootstrap root: g(y f ) Π(g, m x, s x, Y n, X n+1, ˆF n ). Repeat the above B times, and collect the B bootstrap roots in an empirical distribution with α quantile denoted q(α).

72 Resampling Algorithm for predictive distribution of g(y f ) Prediction root: g(y f ) Π where Π is the point predictor. Bootstrap the (fitted or predictive) residuals r 1,..., r n to create pseudo-residuals r 1,..., r n whose edf is denoted by ˆF n. Create pseudo-data Yi = m xi + s xi ri, for i = 1,...n. Calculate a bootstrap pseudo-response Yf = m xf + s xf r where r is drawn randomly from (r 1,..., r n ). Based on the pseudo-data Y 1,..., Y n, re-estimate the functions µ(x) and σ(x) by m x and s x. Calculate bootstrap root: g(y f ) Π(g, m x, s x, Y n, X n+1, ˆF n ). Repeat the above B times, and collect the B bootstrap roots in an empirical distribution with α quantile denoted q(α). Our estimate of the predictive distribution of g(y f ) is the empirical df of bootstrap roots shifted to the right by Π.

73 Resampling Algorithm for predictive distribution of g(y f ) Prediction root: g(y f ) Π where Π is the point predictor. Bootstrap the (fitted or predictive) residuals r 1,..., r n to create pseudo-residuals r 1,..., r n whose edf is denoted by ˆF n. Create pseudo-data Yi = m xi + s xi ri, for i = 1,...n. Calculate a bootstrap pseudo-response Yf = m xf + s xf r where r is drawn randomly from (r 1,..., r n ). Based on the pseudo-data Y 1,..., Y n, re-estimate the functions µ(x) and σ(x) by m x and s x. Calculate bootstrap root: g(y f ) Π(g, m x, s x, Y n, X n+1, ˆF n ). Repeat the above B times, and collect the B bootstrap roots in an empirical distribution with α quantile denoted q(α). Our estimate of the predictive distribution of g(y f ) is the empirical df of bootstrap roots shifted to the right by Π. Then, a (1 α)100% equal-tailed predictive interval for g(y f ) is given by: [Π + q(α/2), Π + q(1 α/2)].

74 Model-free prediction in regression Previous discussion hinged on model: ( ) Y t = µ(x t ) + σ(x t ) ε t with ε t i.i.d. (0, 1) from cdf F. What happens if model ( ) does not hold true?

75 Model-free prediction in regression Previous discussion hinged on model: ( ) Y t = µ(x t ) + σ(x t ) ε t with ε t i.i.d. (0, 1) from cdf F. What happens if model ( ) does not hold true? E.g., the skewness and/or kurtosis of Y t may depend on x t.

76 Model-free prediction in regression Previous discussion hinged on model: ( ) Y t = µ(x t ) + σ(x t ) ε t with ε t i.i.d. (0, 1) from cdf F. What happens if model ( ) does not hold true? E.g., the skewness and/or kurtosis of Y t may depend on x t. cps71 data: skewness/kurtosis of salary depend on age.

77 (a) (b) log wage skewness log wage kurtosis age age (a) Log-wage SKEWNESS vs. age. (b) Log-wage KURTOSIS vs. age. Both skewness and kurtosis are nonconstant!

78 General background- Could try skewness reducing transformations but log already does that.

79 General background- Could try skewness reducing transformations but log already does that. Could try ACE, AVAS, etc.

80 General background- Could try skewness reducing transformations but log already does that. Could try ACE, AVAS, etc. There is a simpler, more general solution!

81 General background The Y t s are still independent but not identically distributed.

82 General background The Y t s are still independent but not identically distributed. We will denote the conditional distribution of Y f given x f by D x (y) = P{Y f y x f = x}

83 General background The Y t s are still independent but not identically distributed. We will denote the conditional distribution of Y f given x f by D x (y) = P{Y f y x f = x} Assume the quantity D x (y) is continuous in both x and y.

84 General background The Y t s are still independent but not identically distributed. We will denote the conditional distribution of Y f given x f by D x (y) = P{Y f y x f = x} Assume the quantity D x (y) is continuous in both x and y. With a categorical response, standard methods like Generalized Linear Models can be invoked, e.g. logistic regression, Poisson regression, etc.

85 General background The Y t s are still independent but not identically distributed. We will denote the conditional distribution of Y f given x f by D x (y) = P{Y f y x f = x} Assume the quantity D x (y) is continuous in both x and y. With a categorical response, standard methods like Generalized Linear Models can be invoked, e.g. logistic regression, Poisson regression, etc. Since D x ( ) depends in a smooth way on x, we can estimate D x (y) by the local empirical N 1 x,h t: x 1{Y t x <h/2 t y} where 1{ } is indicator, and N x,h is the number of summands, i.e., N x,h = # {t : x t x < h/2}.

86 Constructing the transformation More general estimator ˆD x (y) = n i=1 1{Y i y} K ( x x i ) h.

87 Constructing the transformation More general estimator ˆD x (y) = n i=1 1{Y i y} K ( x x i ) h. ˆD x (y) is just a Nadaraya-Watson smoother of the variables 1{Y t y}, t = 1,..., n.

88 Constructing the transformation More general estimator ˆD x (y) = n i=1 1{Y i y} K ( x x i ) h. ˆD x (y) is just a Nadaraya-Watson smoother of the variables 1{Y t y}, t = 1,..., n. Can use local linear smoother of 1{Y t y}, t = 1,..., n.

89 Constructing the transformation More general estimator ˆD x (y) = n i=1 1{Y i y} K ( x x i ) h. ˆD x (y) is just a Nadaraya-Watson smoother of the variables 1{Y t y}, t = 1,..., n. Can use local linear smoother of 1{Y t y}, t = 1,..., n. Estimator ˆD x (y) enjoys many good properties including asymptotic consistency; see e.g. Li and Racine (2007).

90 Constructing the transformation More general estimator ˆD x (y) = n i=1 1{Y i y} K ( x x i ) h. ˆD x (y) is just a Nadaraya-Watson smoother of the variables 1{Y t y}, t = 1,..., n. Can use local linear smoother of 1{Y t y}, t = 1,..., n. Estimator ˆD x (y) enjoys many good properties including asymptotic consistency; see e.g. Li and Racine (2007). But ˆD x (y) is discontinuous in y, and therefore unacceptable! Could smooth it by kernel methods, or...

91 Linear interpolation (a) (b) Fn(x) quantiles of transformed data u_t x quantiles of Uniform (0,1) FIGURE 2: (a) Empirical distribution of a test sample consisting of five N(0, 1) and five N(1/2, 1) independent r.v. s with the piecewise linear estimator D( ) superimposed; the vertical/horizontal lines indicates the inversion process, i.e., finding D 1 (0.75). (b) Q-Q plot of the transformed variables u i vs. the quantiles of Uniform (0,1).

92 Constructing the transformation Since the Y t s are continuous r.v. s, the probability integral transform is the key idea to transform them to i.i.d. ness.

93 Constructing the transformation Since the Y t s are continuous r.v. s, the probability integral transform is the key idea to transform them to i.i.d. ness. To see why, note that if we let η i = D xi (Y i ) for i = 1,..., n our transformation objective would be exactly achieved since η 1,..., η n would be i.i.d. Uniform(0,1).

94 Constructing the transformation Since the Y t s are continuous r.v. s, the probability integral transform is the key idea to transform them to i.i.d. ness. To see why, note that if we let η i = D xi (Y i ) for i = 1,..., n our transformation objective would be exactly achieved since η 1,..., η n would be i.i.d. Uniform(0,1). D x ( ) not known but we have estimator D x ( ) as its proxy.

95 Constructing the transformation Since the Y t s are continuous r.v. s, the probability integral transform is the key idea to transform them to i.i.d. ness. To see why, note that if we let η i = D xi (Y i ) for i = 1,..., n our transformation objective would be exactly achieved since η 1,..., η n would be i.i.d. Uniform(0,1). D x ( ) not known but we have estimator D x ( ) as its proxy. Therefore, our proposed transformation for the MF prediction principle is u i = D xi (Y i ) for i = 1,..., n.

96 Constructing the transformation Since the Y t s are continuous r.v. s, the probability integral transform is the key idea to transform them to i.i.d. ness. To see why, note that if we let η i = D xi (Y i ) for i = 1,..., n our transformation objective would be exactly achieved since η 1,..., η n would be i.i.d. Uniform(0,1). D x ( ) not known but we have estimator D x ( ) as its proxy. Therefore, our proposed transformation for the MF prediction principle is u i = D xi (Y i ) for i = 1,..., n. D x ( ) is consistent, so u 1,..., u n are approximately i.i.d.

97 Constructing the transformation Since the Y t s are continuous r.v. s, the probability integral transform is the key idea to transform them to i.i.d. ness. To see why, note that if we let η i = D xi (Y i ) for i = 1,..., n our transformation objective would be exactly achieved since η 1,..., η n would be i.i.d. Uniform(0,1). D x ( ) not known but we have estimator D x ( ) as its proxy. Therefore, our proposed transformation for the MF prediction principle is u i = D xi (Y i ) for i = 1,..., n. D x ( ) is consistent, so u 1,..., u n are approximately i.i.d. The probability integral transform has been used in the past for building better density estimators Ruppert and Cline (1994).

98 Model-free optimal predictors Transformation:u i = D xi (Y i ) for i = 1,..., n.

99 Model-free optimal predictors Transformation:u i = D xi (Y i ) for i = 1,..., n. Inverse transformation strictly increasing. D 1 x is well-defined since D x ( ) is

100 Model-free optimal predictors Transformation:u i = D xi (Y i ) for i = 1,..., n. Inverse transformation strictly increasing. D 1 x Let u f = D xf (Y f ) and Y f = D 1 x f (u f ). is well-defined since D x ( ) is

101 Model-free optimal predictors Transformation:u i = D xi (Y i ) for i = 1,..., n. Inverse transformation strictly increasing. D 1 x is well-defined since D x ( ) is Let u f = D xf (Y f ) and Y f = Dx 1 f (u f ). D x 1 f (u i ) has (approximately) the same distribution as Y f (conditionally on x f ) for any i.

102 Model-free optimal predictors Transformation:u i = D xi (Y i ) for i = 1,..., n. Inverse transformation strictly increasing. D 1 x is well-defined since D x ( ) is Let u f = D xf (Y f ) and Y f = Dx 1 f (u f ). D x 1 f (u i ) has (approximately) the same distribution as Y f (conditionally on x f ) for any i. So { D 1 x f (u i ), i = 1,..., n} is a set of bona fide potential responses that can be used as proxies for Y f.

103 These n valid potential responses { D 1 x f (u i ), i = 1,..., n} gathered together give an approximate empirical distribution for Y f from which our predictors will be derived.

104 These n valid potential responses { D 1 x f (u i ), i = 1,..., n} gathered together give an approximate empirical distribution for Y f from which our predictors will be derived. The L 2 optimal predictor of g(y f ) will be the expected value of g(y f ) that is approximated by n 1 ( ) n i=1 g D x 1 f (u i ).

105 These n valid potential responses { D 1 x f (u i ), i = 1,..., n} gathered together give an approximate empirical distribution for Y f from which our predictors will be derived. The L 2 optimal predictor of g(y f ) will be the expected value of g(y f ) that is approximated by n 1 ( ) n i=1 g D x 1 f (u i ). The L 1 optimal predictor of g(y( f ) will be) approximated by the sample median of the set {g D x 1 f (u i ), i = 1,..., n}.

106 Model-free optimal point predictors Model-free method L 2 predictor of Y f n 1 ( ) n i=1 D 1 x f D xi (Y i ( ) L 1 predictor of Y f median{ D x 1 f D xi (Y i ) } L 2 predictor of g(y f ) n 1 ( n i=1 g D 1 x f ( D ) xi (Y i )) ( L 1 predictor of g(y f ) median{g D 1 x f ( D ) xi (Y i )) } TABLE. The model-free (MF 2 ) optimal point predictors.

107 Model-free model-fitting The MF predictors (mean or median) can be used to give the equivalent of a model fit.

108 Model-free model-fitting The MF predictors (mean or median) can be used to give the equivalent of a model fit. Focus on the L 2 optimal case with g(x) = x.

109 Model-free model-fitting The MF predictors (mean or median) can be used to give the equivalent of a model fit. Focus on the L 2 optimal case with g(x) = x. Calculating the MF predictor Π(x f ) = n 1 ( ) n i=1 g D 1 x f (u i ) for many different x f values say on a grid, the equivalent of a nonparametric smoother of a regression function is constructed Model-Free Model-Fitting (MF 2 ).

110 M.o.a.T. MF 2 relieves the practitioner from the need to find the optimal transformation for additivity and variance stabilization such as Box/Cox, ACE, AVAS, etc. see Figures 3 and 4.

111 M.o.a.T. MF 2 relieves the practitioner from the need to find the optimal transformation for additivity and variance stabilization such as Box/Cox, ACE, AVAS, etc. see Figures 3 and 4. No need for log-transformation of salaries!

112 M.o.a.T. MF 2 relieves the practitioner from the need to find the optimal transformation for additivity and variance stabilization such as Box/Cox, ACE, AVAS, etc. see Figures 3 and 4. No need for log-transformation of salaries! MF 2 is totally automatic!!

113 wage predicted wage age age FIGURE 3: (a) Wage vs. age scatterplot. (b) Circles indicate the salary predictor n 1 ( ) n i=1 g D 1 x f (u i ) calculated from log-wage data with g(x) exponential. For both figures, the superimposed solid line represents the MF 2 salary predictor calculated from the raw data (without log).

114 (a) (b) quantiles of transformed data u_t quantiles of transformed data u_t quantiles of Uniform (0,1) quantiles of Uniform (0,1) FIGURE 4: Q-Q plots of the u i vs. the quantiles of Uniform (0,1). (a) The u i s are obtained from the log-wage vs. age dataset of Figure 1 using bandwidth 5.5; (b) The u i s are obtained from the raw (untransformed) dataset of Figure 3 using bandwidth 7.3.

115 MF 2 predictive distributions For MF 2 we can always take g(x) = x; no need for other preliminary transformations.

116 MF 2 predictive distributions For MF 2 we can always take g(x) = x; no need for other preliminary transformations. Let g(y f ) Π be the prediction root where Π is either the L 2 or L 1 optimal predictor, i.e., Π = n 1 ( ) n i=1 g D x 1 f (u i ) ( ) or Π = median {g D x 1 f (u i ) }.

117 MF 2 predictive distributions For MF 2 we can always take g(x) = x; no need for other preliminary transformations. Let g(y f ) Π be the prediction root where Π is either the L 2 or L 1 optimal predictor, i.e., Π = n 1 ( ) n i=1 g D x 1 f (u i ) ( ) or Π = median {g D x 1 f (u i ) }. Based on the Y data, estimate the conditional distribution D x ( ) by D x ( ), and let u i = D xi (Y i ) to obtain the transformed data u 1,..., u n that are approximately i.i.d.

118 Resampling Algorithm: MF 2 predictive distribution of g(y f ) Bootstrap the u 1,..., u n to create bootstrap pseudo-data u 1,..., u n whose empirical distribution in denoted ˆF n.

119 Resampling Algorithm: MF 2 predictive distribution of g(y f ) Bootstrap the u 1,..., u n to create bootstrap pseudo-data u1,..., u n whose empirical distribution in denoted ˆF n. Use the inverse transformation D x 1 to create pseudo-data in the Y domain, i.e., Yt = D x 1 t (ut ) for t = 1,...n.

120 Resampling Algorithm: MF 2 predictive distribution of g(y f ) Bootstrap the u 1,..., u n to create bootstrap pseudo-data u1,..., u n whose empirical distribution in denoted ˆF n. Use the inverse transformation D x 1 to create pseudo-data in the Y domain, i.e., Yt = D x 1 t (ut ) for t = 1,...n. Generate a bootstrap pseudo-response Yf = D 1 x f (u) where u is drawn randomly from the set (u 1,..., u n ).

121 Resampling Algorithm: MF 2 predictive distribution of g(y f ) Bootstrap the u 1,..., u n to create bootstrap pseudo-data u1,..., u n whose empirical distribution in denoted ˆF n. Use the inverse transformation D x 1 to create pseudo-data in the Y domain, i.e., Yt = D x 1 t (ut ) for t = 1,...n. Generate a bootstrap pseudo-response Yf = D 1 x f (u) where u is drawn randomly from the set (u 1,..., u n ). Based on the pseudo-data Yt, re-estimate the conditional distribution D x ( ); denote the bootstrap estimator by D x ( ).

122 Resampling Algorithm: MF 2 predictive distribution of g(y f ) Bootstrap the u 1,..., u n to create bootstrap pseudo-data u1,..., u n whose empirical distribution in denoted ˆF n. Use the inverse transformation D x 1 to create pseudo-data in the Y domain, i.e., Yt = D x 1 t (ut ) for t = 1,...n. Generate a bootstrap pseudo-response Yf = D 1 x f (u) where u is drawn randomly from the set (u 1,..., u n ). Based on the pseudo-data Yt, re-estimate the conditional distribution D x ( ); denote the bootstrap estimator by D x ( ). Calculate the bootstrap root g(yf ) Π where Π = n 1 ( ) n i=1 g D x 1 f (ui ) or Π =median {g ( D 1 x f (u i ) )}

123 Resampling Algorithm: MF 2 predictive distribution of g(y f ) Bootstrap the u 1,..., u n to create bootstrap pseudo-data u1,..., u n whose empirical distribution in denoted ˆF n. Use the inverse transformation D x 1 to create pseudo-data in the Y domain, i.e., Yt = D x 1 t (ut ) for t = 1,...n. Generate a bootstrap pseudo-response Yf = D 1 x f (u) where u is drawn randomly from the set (u 1,..., u n ). Based on the pseudo-data Yt, re-estimate the conditional distribution D x ( ); denote the bootstrap estimator by D x ( ). Calculate the bootstrap root g(yf ) Π where Π = n 1 ( ) ( n i=1 g D x 1 f (ui ) or Π =median {g D x 1 f (ui )} ) Repeat the above steps B times, and collect the B bootstrap roots in the form of an edf with α quantile denoted q(α).

124 Resampling Algorithm: MF 2 predictive distribution of g(y f ) Bootstrap the u 1,..., u n to create bootstrap pseudo-data u1,..., u n whose empirical distribution in denoted ˆF n. Use the inverse transformation D x 1 to create pseudo-data in the Y domain, i.e., Yt = D x 1 t (ut ) for t = 1,...n. Generate a bootstrap pseudo-response Yf = D 1 x f (u) where u is drawn randomly from the set (u 1,..., u n ). Based on the pseudo-data Yt, re-estimate the conditional distribution D x ( ); denote the bootstrap estimator by D x ( ). Calculate the bootstrap root g(yf ) Π where Π = n 1 ( ) n i=1 g D x 1 f (ui ) or Π =median {g ( D 1 x f (u i ) )} Repeat the above steps B times, and collect the B bootstrap roots in the form of an edf with α quantile denoted q(α). Predictive distribution of g(y f ) is the above edf shifted to the right by Π, and MF 2 (1 α)100% equal-tailed, prediction interval for g(y f ) is [Π + q(α/2), Π + q(1 α/2)].

125 MF 2 : Confidence intervals for the regression function without an additive model To fix ideas, assume g(x) = x. MF 2 L 2 optimal point predictor of Y f is Π xf = n 1 n i=1 D 1 x f (u i ) where u i = D xi (Y i )

126 MF 2 : Confidence intervals for the regression function without an additive model To fix ideas, assume g(x) = x. MF 2 L 2 optimal point predictor of Y f is Π xf = n 1 n i=1 D 1 x f (u i ) where u i = D xi (Y i ) Π xf is an estimate of the regression function µ(x f ) = E(Y f x f ).

127 MF 2 : Confidence intervals for the regression function without an additive model To fix ideas, assume g(x) = x. MF 2 L 2 optimal point predictor of Y f is Π xf = n 1 n i=1 D 1 x f (u i ) where u i = D xi (Y i ) Π xf is an estimate of the regression function µ(x f ) = E(Y f x f ). How does Π xf compare to the kernel smoother m xf?

128 MF 2 : Confidence intervals for the regression function without an additive model To fix ideas, assume g(x) = x. MF 2 L 2 optimal point predictor of Y f is Π xf = n 1 n i=1 D 1 x f (u i ) where u i = D xi (Y i ) Π xf is an estimate of the regression function µ(x f ) = E(Y f x f ). How does Π xf compare to the kernel smoother m xf? Can we construct confidence intervals for the regression function µ(x f ) without additive model ( ) Y t = µ(x t ) + σ(x t ) ε t with ε t i.i.d. (0,1)?

129 MF 2 : Confidence intervals for the regression function without an additive model Note that Π xf is defined as a function of the approx. i.i.d. variables u 1,..., u n.

130 MF 2 : Confidence intervals for the regression function without an additive model Note that Π xf is defined as a function of the approx. i.i.d. variables u 1,..., u n. So Π xf could be bootstrapped by resampling the i.i.d. variables u 1,..., u n.

131 MF 2 : Confidence intervals for the regression function without an additive model Note that Π xf is defined as a function of the approx. i.i.d. variables u 1,..., u n. So Π xf could be bootstrapped by resampling the i.i.d. variables u 1,..., u n. In fact, Π xf is asymptotically equivalent to the kernel smoother m xf, i.e., Π xf = m xf + o P (1/ nh).

132 MF 2 : Confidence intervals for the regression function without an additive model Note that Π xf is defined as a function of the approx. i.i.d. variables u 1,..., u n. So Π xf could be bootstrapped by resampling the i.i.d. variables u 1,..., u n. In fact, Π xf is asymptotically equivalent to the kernel smoother m xf, i.e., Π xf = m xf + o P (1/ nh). Recall u i approx. i.i.d. Uniform(0,1). So, for large n, Π xf = n 1 n i=1 D 1 x f (u i ) 1 0 D 1 x f (u) du = m xf

133 MF 2 : Confidence intervals for the regression function without an additive model Note that Π xf is defined as a function of the approx. i.i.d. variables u 1,..., u n. So Π xf could be bootstrapped by resampling the i.i.d. variables u 1,..., u n. In fact, Π xf is asymptotically equivalent to the kernel smoother m xf, i.e., Π xf = m xf + o P (1/ nh). Recall u i approx. i.i.d. Uniform(0,1). So, for large n, Π xf = n 1 n i=1 D 1 x f (u i ) 1 0 D 1 x f (u) du = m xf This is just the identity ydf (y) = 1 0 F 1 (u) du.

134 Resampling Algorithm: MF 2 confidence intervals for µ(x f ) Define the confidence root µ(x f ) Π xf. Bootstrap the u 1,..., u n to create bootstrap pseudo-data u 1,..., u n whose empirical distribution in denoted ˆF n.

135 Resampling Algorithm: MF 2 confidence intervals for µ(x f ) Define the confidence root µ(x f ) Π xf. Bootstrap the u 1,..., u n to create bootstrap pseudo-data u 1,..., u n whose empirical distribution in denoted ˆF n. Use the inverse transformation D 1 x the Y domain, i.e., Y t = D 1 x t to create pseudo-data in (ut ) for t = 1,...n.

136 Resampling Algorithm: MF 2 confidence intervals for µ(x f ) Define the confidence root µ(x f ) Π xf. Bootstrap the u 1,..., u n to create bootstrap pseudo-data u 1,..., u n whose empirical distribution in denoted ˆF n. Use the inverse transformation D 1 x the Y domain, i.e., Y t = D 1 x t to create pseudo-data in (ut ) for t = 1,...n. Based on the pseudo-data Y t, re-estimate the conditional distribution D x ( ); denote the bootstrap estimator by D x ( ).

137 Resampling Algorithm: MF 2 confidence intervals for µ(x f ) Define the confidence root µ(x f ) Π xf. Bootstrap the u 1,..., u n to create bootstrap pseudo-data u 1,..., u n whose empirical distribution in denoted ˆF n. Use the inverse transformation D 1 x the Y domain, i.e., Y t = D 1 x t to create pseudo-data in (ut ) for t = 1,...n. Based on the pseudo-data Y t, re-estimate the conditional distribution D x ( ); denote the bootstrap estimator by D x ( ). Calculate the bootstrap confidence root Π xf Π where Π = n 1 n 1 i=1 D x f (ui ).

138 Resampling Algorithm: MF 2 confidence intervals for µ(x f ) Define the confidence root µ(x f ) Π xf. Bootstrap the u 1,..., u n to create bootstrap pseudo-data u 1,..., u n whose empirical distribution in denoted ˆF n. Use the inverse transformation D 1 x the Y domain, i.e., Y t = D 1 x t to create pseudo-data in (ut ) for t = 1,...n. Based on the pseudo-data Y t, re-estimate the conditional distribution D x ( ); denote the bootstrap estimator by D x ( ). Calculate the bootstrap confidence root Π xf Π where Π = n 1 n 1 i=1 D x f (ui ). Repeat the above steps B times, and collect the B bootstrap roots in the form of an edf with α quantile denoted q(α).

139 Resampling Algorithm: MF 2 confidence intervals for µ(x f ) Define the confidence root µ(x f ) Π xf. Bootstrap the u 1,..., u n to create bootstrap pseudo-data u 1,..., u n whose empirical distribution in denoted ˆF n. Use the inverse transformation D 1 x the Y domain, i.e., Y t = D 1 x t to create pseudo-data in (ut ) for t = 1,...n. Based on the pseudo-data Y t, re-estimate the conditional distribution D x ( ); denote the bootstrap estimator by D x ( ). Calculate the bootstrap confidence root Π xf Π where Π = n 1 n 1 i=1 D x f (ui ). Repeat the above steps B times, and collect the B bootstrap roots in the form of an edf with α quantile denoted q(α). MF 2 (1 α)100% equal-tailed, confidence interval for µ(x f ) is [Π + q(α/2), Π + q(1 α/2)].

140 Simulation: regression under model ( ) ( ) Y t = µ(x t ) + σ(x t ) ε t with ε t i.i.d. (0, 1) with cdf F. Design points x 1,..., x n for n = 100 drawn from a uniform distribution on (0, 2π)

141 Simulation: regression under model ( ) ( ) Y t = µ(x t ) + σ(x t ) ε t with ε t i.i.d. (0, 1) with cdf F. Design points x 1,..., x n for n = 100 drawn from a uniform distribution on (0, 2π) µ(x) = sin(x), σ(x) = (cos(x/2) + 2)/7, and errors N(0,1) or Laplace.

142 Simulation: regression under model ( ) ( ) Y t = µ(x t ) + σ(x t ) ε t with ε t i.i.d. (0, 1) with cdf F. Design points x 1,..., x n for n = 100 drawn from a uniform distribution on (0, 2π) µ(x) = sin(x), σ(x) = (cos(x/2) + 2)/7, and errors N(0,1) or Laplace. Prediction points: x f = π; µ(x) has high slope but zero curvature easy case for estimation.

143 Simulation: regression under model ( ) ( ) Y t = µ(x t ) + σ(x t ) ε t with ε t i.i.d. (0, 1) with cdf F. Design points x 1,..., x n for n = 100 drawn from a uniform distribution on (0, 2π) µ(x) = sin(x), σ(x) = (cos(x/2) + 2)/7, and errors N(0,1) or Laplace. Prediction points: x f = π; µ(x) has high slope but zero curvature easy case for estimation. x f = π/2 and x f = 3π/2; µ(x) has zero slope but high curvature peak and valley so large bias of m x.

144 y x y (a) (b) x FIGURE 6: Typical scatterplots with superimposed kernel smoothers; (a) Normal data; (b) Laplace data.

145 x f /π = MB MF/MB MF MF/MF Normal Table 4.2(a). Empirical coverage levels (CVR) of prediction intervals according to different methods at several x f points spanning the interval (0, 2π). Nominal coverage was 0.90, sample size n = 100 and bandwidths chosen by L 1 cross-validation. Error distribution: i.i.d. Normal. [st.error of CVRs =0.013]

146 x f /π = MB MF/MB MF MF/MF Normal Table 4.3(a). Empirical coverage levels (CVR) of prediction intervals according to different methods at several x f points spanning the interval (0, 2π). Nominal coverage was 0.90, sample size n = 100 and bandwidths chosen by L 1 cross-validation. Error distribution: i.i.d. Laplace.

147 NORMAL intervals exhibit under-coverage even when the true distribution is Normal they need explicit bias correction. Bootstrap methods more variable due to extra randomization. The MF/MB intervals are more accurate than their MB analogs in the case x f = π/2. When x f = π, the MB intervals are most accurate, and the MF/MB intervals seem to over-correct (and over-cover). Over-coverage can be attributed to bias leakage and should be alleviated with a larger sample size, using higher-order kernels, or by a bandwidth trick such as undersmoothing.

148 Performance of MF 2 (resp. MF/MF 2 ) intervals resembles that of MB intervals (resp. MF/MB). MF/MF 2 intervals are best at x f = π/2; this is quite surprising since one would expect the MB and MF/MB intervals to have a distinct advantage when model ( ) is true. The price to pay for using the generally valid MF/MF 2 intervals instead of the model-specific MF/MB ones is the increased variability of interval length.

149 Simulation: regression without model ( ) Instead: Y = µ(x) + σ(x) ε x with ε x = cx Z+(1 cx )W c 2 x +(1 c x ) 2 Z N(0, 1) independent of W that is also (0,1).

150 Simulation: regression without model ( ) Instead: Y = µ(x) + σ(x) ε x with ε x = cx Z+(1 cx )W c 2 x +(1 c x ) 2 Z N(0, 1) independent of W that is also (0,1). W is either exponential, i.e., 1 2 χ2 2 1, to capture skewness, or 3 Student s t with 5 d.f., i.e., 5 t 5, to capture kurtosis.

151 Simulation: regression without model ( ) Instead: Y = µ(x) + σ(x) ε x with ε x = cx Z+(1 cx )W c 2 x +(1 c x ) 2 Z N(0, 1) independent of W that is also (0,1). W is either exponential, i.e., 1 2 χ2 2 1, to capture skewness, or 3 Student s t with 5 d.f., i.e., 5 t 5, to capture kurtosis. ε x independent but not i.i.d.: c x = x/(2π) for x [0, 2π] Large x: ε x is close to Normal. Small x: ε x is skewed/kurtotic.

152 x f /π = MB MF/MB MF MF/MF Normal Table 4.4(a). CVRs with error distribution non i.i.d. skewed. x f /π = MB MF/MB MF MF/MF Normal Table 4.5(a). CVRs with error distribution non i.i.d. kurtotic.

153 The NORMAL intervals are totally unreliable which is to be expected due to the non-normal error distributions. The MF/MF 2 intervals are best (by far) in the cases x f = π/2 and x f = 3π/2 attaining close to nominal coverage even with a sample size as low as n = 100. The case x f = π remains problematic for the same reasons previously discussed.

154 Conclusions Model-free prediction principle

155 Conclusions Model-free prediction principle Novel viewpoint for inference: estimation and prediction

156 Conclusions Model-free prediction principle Novel viewpoint for inference: estimation and prediction Regression: point predictors and MF 2

157 Conclusions Model-free prediction principle Novel viewpoint for inference: estimation and prediction Regression: point predictors and MF 2 Prediction intervals with good coverage properties

158 Conclusions Model-free prediction principle Novel viewpoint for inference: estimation and prediction Regression: point predictors and MF 2 Prediction intervals with good coverage properties Totally Automatic: no need to search for optimal transformations in regression...

159 Conclusions Model-free prediction principle Novel viewpoint for inference: estimation and prediction Regression: point predictors and MF 2 Prediction intervals with good coverage properties Totally Automatic: no need to search for optimal transformations in regression......but very computationally intensive!

160 Bootstrap prediction intervals for time series [joint work with Li Pan, UCSD] TIME SERIES DATA: X 1,..., X n GOAL: Predict X n+1 given the data (new notation) Problem: We can not choose x f ; it is given by the recent history of the time series, e.g., X n 1,, X n p for some p. Bootstrap series X 1, X 2,..., X n 1, X n, X n+1,... must be such that have the same values for the recent history, i.e., X n 1 = X n 1, X n 2 = X n 2,..., X n p = X n p.

161 Forward" bootstrap: Linear AR model: φ(b)x t = ɛ t with ɛ t iid Let ˆφ be the LS (scatterplot) estimate of φ based on X 1,..., X n Let φ be the LS (scatterplot) estimate of φ based on X 1,..., X n p, X n p+1,..., X n i.e., the last p values have been changed/corrupted/omitted. ˆφ φ = O p (1/n) Hence, we can artificially change the last p values in the bootstrap world and the effect will be negligible. Bootstrap series: X1,..., X n p, X n p+1,..., X n where Xt = ˆφ(B) 1 ɛ t (forward recursion) and X n p+1,..., X n are the same values as in the original dataset.

Bootstrap confidence intervals in nonparametric regression without an additive model

Bootstrap confidence intervals in nonparametric regression without an additive model Bootstrap confidence intervals in nonparametric regression witout an additive model Dimitris N. Politis Abstract Te problem of confidence interval construction in nonparametric regression via te bootstrap

More information

REJOINDER Bootstrap prediction intervals for linear, nonlinear and nonparametric autoregressions

REJOINDER Bootstrap prediction intervals for linear, nonlinear and nonparametric autoregressions arxiv: arxiv:0000.0000 REJOINDER Bootstrap prediction intervals for linear, nonlinear and nonparametric autoregressions Li Pan and Dimitris N. Politis Li Pan Department of Mathematics University of California

More information

UC San Diego Recent Work

UC San Diego Recent Work UC San Diego Recent Work Title Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions Permalink https://escholarship.org/uc/item/67h5s74t Authors Pan, Li Politis, Dimitris

More information

Nonparametric Methods

Nonparametric Methods Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis

More information

Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis

Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis Sílvia Gonçalves and Benoit Perron Département de sciences économiques,

More information

Nonparametric Inference via Bootstrapping the Debiased Estimator

Nonparametric Inference via Bootstrapping the Debiased Estimator Nonparametric Inference via Bootstrapping the Debiased Estimator Yen-Chi Chen Department of Statistics, University of Washington ICSA-Canada Chapter Symposium 2017 1 / 21 Problem Setup Let X 1,, X n be

More information

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics Jiti Gao Department of Statistics School of Mathematics and Statistics The University of Western Australia Crawley

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Local Polynomial Regression

Local Polynomial Regression VI Local Polynomial Regression (1) Global polynomial regression We observe random pairs (X 1, Y 1 ),, (X n, Y n ) where (X 1, Y 1 ),, (X n, Y n ) iid (X, Y ). We want to estimate m(x) = E(Y X = x) based

More information

Nonparametric Regression

Nonparametric Regression Nonparametric Regression Econ 674 Purdue University April 8, 2009 Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 1 / 31 Consider the univariate nonparametric regression model: where y

More information

Nonparametric Regression Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction

Nonparametric Regression Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction Tine Buch-Kromann Univariate Kernel Regression The relationship between two variables, X and Y where m(

More information

Why Do Statisticians Treat Predictors as Fixed? A Conspiracy Theory

Why Do Statisticians Treat Predictors as Fixed? A Conspiracy Theory Why Do Statisticians Treat Predictors as Fixed? A Conspiracy Theory Andreas Buja joint with the PoSI Group: Richard Berk, Lawrence Brown, Linda Zhao, Kai Zhang Ed George, Mikhail Traskin, Emil Pitkin,

More information

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

Time Series and Forecasting Lecture 4 NonLinear Time Series

Time Series and Forecasting Lecture 4 NonLinear Time Series Time Series and Forecasting Lecture 4 NonLinear Time Series Bruce E. Hansen Summer School in Economics and Econometrics University of Crete July 23-27, 2012 Bruce Hansen (University of Wisconsin) Foundations

More information

Econ 582 Nonparametric Regression

Econ 582 Nonparametric Regression Econ 582 Nonparametric Regression Eric Zivot May 28, 2013 Nonparametric Regression Sofarwehaveonlyconsideredlinearregressionmodels = x 0 β + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β The assume

More information

Local Polynomial Modelling and Its Applications

Local Polynomial Modelling and Its Applications Local Polynomial Modelling and Its Applications J. Fan Department of Statistics University of North Carolina Chapel Hill, USA and I. Gijbels Institute of Statistics Catholic University oflouvain Louvain-la-Neuve,

More information

Local Polynomial Wavelet Regression with Missing at Random

Local Polynomial Wavelet Regression with Missing at Random Applied Mathematical Sciences, Vol. 6, 2012, no. 57, 2805-2819 Local Polynomial Wavelet Regression with Missing at Random Alsaidi M. Altaher School of Mathematical Sciences Universiti Sains Malaysia 11800

More information

The Nonparametric Bootstrap

The Nonparametric Bootstrap The Nonparametric Bootstrap The nonparametric bootstrap may involve inferences about a parameter, but we use a nonparametric procedure in approximating the parametric distribution using the ECDF. We use

More information

Additive Isotonic Regression

Additive Isotonic Regression Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive

More information

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods Chapter 4 Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods 4.1 Introduction It is now explicable that ridge regression estimator (here we take ordinary ridge estimator (ORE)

More information

Transformation and Smoothing in Sample Survey Data

Transformation and Smoothing in Sample Survey Data Scandinavian Journal of Statistics, Vol. 37: 496 513, 2010 doi: 10.1111/j.1467-9469.2010.00691.x Published by Blackwell Publishing Ltd. Transformation and Smoothing in Sample Survey Data YANYUAN MA Department

More information

Chapter 2: Resampling Maarten Jansen

Chapter 2: Resampling Maarten Jansen Chapter 2: Resampling Maarten Jansen Randomization tests Randomized experiment random assignment of sample subjects to groups Example: medical experiment with control group n 1 subjects for true medicine,

More information

Nonparametric Econometrics

Nonparametric Econometrics Applied Microeconometrics with Stata Nonparametric Econometrics Spring Term 2011 1 / 37 Contents Introduction The histogram estimator The kernel density estimator Nonparametric regression estimators Semi-

More information

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas 0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity

More information

UNIVERSITÄT POTSDAM Institut für Mathematik

UNIVERSITÄT POTSDAM Institut für Mathematik UNIVERSITÄT POTSDAM Institut für Mathematik Testing the Acceleration Function in Life Time Models Hannelore Liero Matthias Liero Mathematische Statistik und Wahrscheinlichkeitstheorie Universität Potsdam

More information

Nonparametric Regression. Badr Missaoui

Nonparametric Regression. Badr Missaoui Badr Missaoui Outline Kernel and local polynomial regression. Penalized regression. We are given n pairs of observations (X 1, Y 1 ),...,(X n, Y n ) where Y i = r(x i ) + ε i, i = 1,..., n and r(x) = E(Y

More information

Bootstrap prediction intervals for Markov processes

Bootstrap prediction intervals for Markov processes arxiv: arxiv:0000.0000 Bootstrap prediction intervals for Markov processes Li Pan and Dimitris N. Politis Li Pan Department of Matematics University of California San Diego La Jolla, CA 92093-0112, USA

More information

Bootstrap & Confidence/Prediction intervals

Bootstrap & Confidence/Prediction intervals Bootstrap & Confidence/Prediction intervals Olivier Roustant Mines Saint-Étienne 2017/11 Olivier Roustant (EMSE) Bootstrap & Confidence/Prediction intervals 2017/11 1 / 9 Framework Consider a model with

More information

University of California San Diego and Stanford University and

University of California San Diego and Stanford University and First International Workshop on Functional and Operatorial Statistics. Toulouse, June 19-21, 2008 K-sample Subsampling Dimitris N. olitis andjoseph.romano University of California San Diego and Stanford

More information

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Kuangyu Wen & Ximing Wu Texas A&M University Info-Metrics Institute Conference: Recent Innovations in Info-Metrics October

More information

What to do if Assumptions are Violated?

What to do if Assumptions are Violated? What to do if Assumptions are Violated? Abandon simple linear regression for something else (usually more complicated). Some examples of alternative models: weighted least square appropriate model if the

More information

A New Method for Varying Adaptive Bandwidth Selection

A New Method for Varying Adaptive Bandwidth Selection IEEE TRASACTIOS O SIGAL PROCESSIG, VOL. 47, O. 9, SEPTEMBER 1999 2567 TABLE I SQUARE ROOT MEA SQUARED ERRORS (SRMSE) OF ESTIMATIO USIG THE LPA AD VARIOUS WAVELET METHODS A ew Method for Varying Adaptive

More information

A nonparametric method of multi-step ahead forecasting in diffusion processes

A nonparametric method of multi-step ahead forecasting in diffusion processes A nonparametric method of multi-step ahead forecasting in diffusion processes Mariko Yamamura a, Isao Shoji b a School of Pharmacy, Kitasato University, Minato-ku, Tokyo, 108-8641, Japan. b Graduate School

More information

41903: Introduction to Nonparametrics

41903: Introduction to Nonparametrics 41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific

More information

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij = K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing

More information

Confidence intervals for kernel density estimation

Confidence intervals for kernel density estimation Stata User Group - 9th UK meeting - 19/20 May 2003 Confidence intervals for kernel density estimation Carlo Fiorio c.fiorio@lse.ac.uk London School of Economics and STICERD Stata User Group - 9th UK meeting

More information

Introduction to Nonparametric Regression

Introduction to Nonparametric Regression Introduction to Nonparametric Regression Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 04-Jan-2017 Nathaniel E. Helwig (U of Minnesota)

More information

A Novel Nonparametric Density Estimator

A Novel Nonparametric Density Estimator A Novel Nonparametric Density Estimator Z. I. Botev The University of Queensland Australia Abstract We present a novel nonparametric density estimator and a new data-driven bandwidth selection method with

More information

Inference on distributions and quantiles using a finite-sample Dirichlet process

Inference on distributions and quantiles using a finite-sample Dirichlet process Dirichlet IDEAL Theory/methods Simulations Inference on distributions and quantiles using a finite-sample Dirichlet process David M. Kaplan University of Missouri Matt Goldman UC San Diego Midwest Econometrics

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Unit 10: Simple Linear Regression and Correlation

Unit 10: Simple Linear Regression and Correlation Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for

More information

Lecture Notes 15 Prediction Chapters 13, 22, 20.4.

Lecture Notes 15 Prediction Chapters 13, 22, 20.4. Lecture Notes 15 Prediction Chapters 13, 22, 20.4. 1 Introduction Prediction is covered in detail in 36-707, 36-701, 36-715, 10/36-702. Here, we will just give an introduction. We observe training data

More information

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be Quantile methods Class Notes Manuel Arellano December 1, 2009 1 Unconditional quantiles Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be Q τ (Y ) q τ F 1 (τ) =inf{r : F

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Statistical inference on Lévy processes

Statistical inference on Lévy processes Alberto Coca Cabrero University of Cambridge - CCA Supervisors: Dr. Richard Nickl and Professor L.C.G.Rogers Funded by Fundación Mutua Madrileña and EPSRC MASDOC/CCA student workshop 2013 26th March Outline

More information

Introduction. Linear Regression. coefficient estimates for the wage equation: E(Y X) = X 1 β X d β d = X β

Introduction. Linear Regression. coefficient estimates for the wage equation: E(Y X) = X 1 β X d β d = X β Introduction - Introduction -2 Introduction Linear Regression E(Y X) = X β +...+X d β d = X β Example: Wage equation Y = log wages, X = schooling (measured in years), labor market experience (measured

More information

Basis Penalty Smoothers. Simon Wood Mathematical Sciences, University of Bath, U.K.

Basis Penalty Smoothers. Simon Wood Mathematical Sciences, University of Bath, U.K. Basis Penalty Smoothers Simon Wood Mathematical Sciences, University of Bath, U.K. Estimating functions It is sometimes useful to estimate smooth functions from data, without being too precise about the

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Gaussian Processes (10/16/13)

Gaussian Processes (10/16/13) STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University JSM, 2015 E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015

More information

Quantile Regression for Residual Life and Empirical Likelihood

Quantile Regression for Residual Life and Empirical Likelihood Quantile Regression for Residual Life and Empirical Likelihood Mai Zhou email: mai@ms.uky.edu Department of Statistics, University of Kentucky, Lexington, KY 40506-0027, USA Jong-Hyeon Jeong email: jeong@nsabp.pitt.edu

More information

Preface. 1 Nonparametric Density Estimation and Testing. 1.1 Introduction. 1.2 Univariate Density Estimation

Preface. 1 Nonparametric Density Estimation and Testing. 1.1 Introduction. 1.2 Univariate Density Estimation Preface Nonparametric econometrics has become one of the most important sub-fields in modern econometrics. The primary goal of this lecture note is to introduce various nonparametric and semiparametric

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department

More information

Nonparametric Inference in Cosmology and Astrophysics: Biases and Variants

Nonparametric Inference in Cosmology and Astrophysics: Biases and Variants Nonparametric Inference in Cosmology and Astrophysics: Biases and Variants Christopher R. Genovese Department of Statistics Carnegie Mellon University http://www.stat.cmu.edu/ ~ genovese/ Collaborators:

More information

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown. Weighting We have seen that if E(Y) = Xβ and V (Y) = σ 2 G, where G is known, the model can be rewritten as a linear model. This is known as generalized least squares or, if G is diagonal, with trace(g)

More information

Statistics 3657 : Moment Approximations

Statistics 3657 : Moment Approximations Statistics 3657 : Moment Approximations Preliminaries Suppose that we have a r.v. and that we wish to calculate the expectation of g) for some function g. Of course we could calculate it as Eg)) by the

More information

Machine Learning Linear Regression. Prof. Matteo Matteucci

Machine Learning Linear Regression. Prof. Matteo Matteucci Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects Contents 1 Review of Residuals 2 Detecting Outliers 3 Influential Observations 4 Multicollinearity and its Effects W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 32 Model Diagnostics:

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR

More information

Sociology 740 John Fox. Lecture Notes. 1. Introduction. Copyright 2014 by John Fox. Introduction 1

Sociology 740 John Fox. Lecture Notes. 1. Introduction. Copyright 2014 by John Fox. Introduction 1 Sociology 740 John Fox Lecture Notes 1. Introduction Copyright 2014 by John Fox Introduction 1 1. Goals I To introduce the notion of regression analysis as a description of how the average value of a response

More information

Ultra High Dimensional Variable Selection with Endogenous Variables

Ultra High Dimensional Variable Selection with Endogenous Variables 1 / 39 Ultra High Dimensional Variable Selection with Endogenous Variables Yuan Liao Princeton University Joint work with Jianqing Fan Job Market Talk January, 2012 2 / 39 Outline 1 Examples of Ultra High

More information

Introduction to Econometrics

Introduction to Econometrics Introduction to Econometrics Lecture 3 : Regression: CEF and Simple OLS Zhaopeng Qu Business School,Nanjing University Oct 9th, 2017 Zhaopeng Qu (Nanjing University) Introduction to Econometrics Oct 9th,

More information

Chapter 4: Imputation

Chapter 4: Imputation Chapter 4: Imputation Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Basic Theory for imputation 3 Variance estimation after imputation 4 Replication variance estimation

More information

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Statistica Sinica 19 (2009), 71-81 SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Song Xi Chen 1,2 and Chiu Min Wong 3 1 Iowa State University, 2 Peking University and

More information

Non-parametric Inference and Resampling

Non-parametric Inference and Resampling Non-parametric Inference and Resampling Exercises by David Wozabal (Last update 3. Juni 2013) 1 Basic Facts about Rank and Order Statistics 1.1 10 students were asked about the amount of time they spend

More information

Small Sample Corrections for LTS and MCD

Small Sample Corrections for LTS and MCD myjournal manuscript No. (will be inserted by the editor) Small Sample Corrections for LTS and MCD G. Pison, S. Van Aelst, and G. Willems Department of Mathematics and Computer Science, Universitaire Instelling

More information

STAT Chapter 11: Regression

STAT Chapter 11: Regression STAT 515 -- Chapter 11: Regression Mostly we have studied the behavior of a single random variable. Often, however, we gather data on two random variables. We wish to determine: Is there a relationship

More information

Higher-Order von Mises Expansions, Bagging and Assumption-Lean Inference

Higher-Order von Mises Expansions, Bagging and Assumption-Lean Inference Higher-Order von Mises Expansions, Bagging and Assumption-Lean Inference Andreas Buja joint with: Richard Berk, Lawrence Brown, Linda Zhao, Arun Kuchibhotla, Kai Zhang Werner Stützle, Ed George, Mikhail

More information

Error distribution function for parametrically truncated and censored data

Error distribution function for parametrically truncated and censored data Error distribution function for parametrically truncated and censored data Géraldine LAURENT Jointly with Cédric HEUCHENNE QuantOM, HEC-ULg Management School - University of Liège Friday, 14 September

More information

Better Bootstrap Confidence Intervals

Better Bootstrap Confidence Intervals by Bradley Efron University of Washington, Department of Statistics April 12, 2012 An example Suppose we wish to make inference on some parameter θ T (F ) (e.g. θ = E F X ), based on data We might suppose

More information

2 Functions of random variables

2 Functions of random variables 2 Functions of random variables A basic statistical model for sample data is a collection of random variables X 1,..., X n. The data are summarised in terms of certain sample statistics, calculated as

More information

Nonparametric Regression. Changliang Zou

Nonparametric Regression. Changliang Zou Nonparametric Regression Institute of Statistics, Nankai University Email: nk.chlzou@gmail.com Smoothing parameter selection An overall measure of how well m h (x) performs in estimating m(x) over x (0,

More information

36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1

36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1 36. Multisample U-statistics jointly distributed U-statistics Lehmann 6.1 In this topic, we generalize the idea of U-statistics in two different directions. First, we consider single U-statistics for situations

More information

Uncertainty Quantification for Inverse Problems. November 7, 2011

Uncertainty Quantification for Inverse Problems. November 7, 2011 Uncertainty Quantification for Inverse Problems November 7, 2011 Outline UQ and inverse problems Review: least-squares Review: Gaussian Bayesian linear model Parametric reductions for IP Bias, variance

More information

Bayesian spatial quantile regression

Bayesian spatial quantile regression Brian J. Reich and Montserrat Fuentes North Carolina State University and David B. Dunson Duke University E-mail:reich@stat.ncsu.edu Tropospheric ozone Tropospheric ozone has been linked with several adverse

More information

Why experimenters should not randomize, and what they should do instead

Why experimenters should not randomize, and what they should do instead Why experimenters should not randomize, and what they should do instead Maximilian Kasy Department of Economics, Harvard University Maximilian Kasy (Harvard) Experimental design 1 / 42 project STAR Introduction

More information

On the Robust Modal Local Polynomial Regression

On the Robust Modal Local Polynomial Regression International Journal of Statistical Sciences ISSN 683 5603 Vol. 9(Special Issue), 2009, pp 27-23 c 2009 Dept. of Statistics, Univ. of Rajshahi, Bangladesh On the Robust Modal Local Polynomial Regression

More information

Web-based Supplementary Material for. Dependence Calibration in Conditional Copulas: A Nonparametric Approach

Web-based Supplementary Material for. Dependence Calibration in Conditional Copulas: A Nonparametric Approach 1 Web-based Supplementary Material for Dependence Calibration in Conditional Copulas: A Nonparametric Approach Elif F. Acar, Radu V. Craiu, and Fang Yao Web Appendix A: Technical Details The score and

More information

Statistical View of Least Squares

Statistical View of Least Squares Basic Ideas Some Examples Least Squares May 22, 2007 Basic Ideas Simple Linear Regression Basic Ideas Some Examples Least Squares Suppose we have two variables x and y Basic Ideas Simple Linear Regression

More information

The Simple Regression Model. Part II. The Simple Regression Model

The Simple Regression Model. Part II. The Simple Regression Model Part II The Simple Regression Model As of Sep 22, 2015 Definition 1 The Simple Regression Model Definition Estimation of the model, OLS OLS Statistics Algebraic properties Goodness-of-Fit, the R-square

More information

Section 7: Local linear regression (loess) and regression discontinuity designs

Section 7: Local linear regression (loess) and regression discontinuity designs Section 7: Local linear regression (loess) and regression discontinuity designs Yotam Shem-Tov Fall 2015 Yotam Shem-Tov STAT 239/ PS 236A October 26, 2015 1 / 57 Motivation We will focus on local linear

More information

Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference

Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference 1 / 172 Bootstrap inference Francisco Cribari-Neto Departamento de Estatística Universidade Federal de Pernambuco Recife / PE, Brazil email: cribari@gmail.com October 2014 2 / 172 Unpaid advertisement

More information

Computational treatment of the error distribution in nonparametric regression with right-censored and selection-biased data

Computational treatment of the error distribution in nonparametric regression with right-censored and selection-biased data Computational treatment of the error distribution in nonparametric regression with right-censored and selection-biased data Géraldine Laurent 1 and Cédric Heuchenne 2 1 QuantOM, HEC-Management School of

More information

STAT Section 2.1: Basic Inference. Basic Definitions

STAT Section 2.1: Basic Inference. Basic Definitions STAT 518 --- Section 2.1: Basic Inference Basic Definitions Population: The collection of all the individuals of interest. This collection may be or even. Sample: A collection of elements of the population.

More information

F9 F10: Autocorrelation

F9 F10: Autocorrelation F9 F10: Autocorrelation Feng Li Department of Statistics, Stockholm University Introduction In the classic regression model we assume cov(u i, u j x i, x k ) = E(u i, u j ) = 0 What if we break the assumption?

More information

Lecture 3: Statistical Decision Theory (Part II)

Lecture 3: Statistical Decision Theory (Part II) Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical

More information

Post-exam 2 practice questions 18.05, Spring 2014

Post-exam 2 practice questions 18.05, Spring 2014 Post-exam 2 practice questions 18.05, Spring 2014 Note: This is a set of practice problems for the material that came after exam 2. In preparing for the final you should use the previous review materials,

More information

Independent and conditionally independent counterfactual distributions

Independent and conditionally independent counterfactual distributions Independent and conditionally independent counterfactual distributions Marcin Wolski European Investment Bank M.Wolski@eib.org Society for Nonlinear Dynamics and Econometrics Tokyo March 19, 2018 Views

More information

Comments on: Model-free model-fitting and predictive distributions : Applications to Small Area Statistics and Treatment Effect Estimation

Comments on: Model-free model-fitting and predictive distributions : Applications to Small Area Statistics and Treatment Effect Estimation Article Comments on: Model-free model-fitting and predictive distributions : Applications to Small Area Statistics and Treatment Effect Estimation SPERLICH, Stefan Andréas Reference SPERLICH, Stefan Andréas.

More information

Modelling Non-linear and Non-stationary Time Series

Modelling Non-linear and Non-stationary Time Series Modelling Non-linear and Non-stationary Time Series Chapter 2: Non-parametric methods Henrik Madsen Advanced Time Series Analysis September 206 Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September

More information

4 Nonparametric Regression

4 Nonparametric Regression 4 Nonparametric Regression 4.1 Univariate Kernel Regression An important question in many fields of science is the relation between two variables, say X and Y. Regression analysis is concerned with the

More information

1 Empirical Likelihood

1 Empirical Likelihood Empirical Likelihood February 3, 2016 Debdeep Pati 1 Empirical Likelihood Empirical likelihood a nonparametric method without having to assume the form of the underlying distribution. It retains some of

More information

Nonparametric confidence intervals. for receiver operating characteristic curves

Nonparametric confidence intervals. for receiver operating characteristic curves Nonparametric confidence intervals for receiver operating characteristic curves Peter G. Hall 1, Rob J. Hyndman 2, and Yanan Fan 3 5 December 2003 Abstract: We study methods for constructing confidence

More information

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples Bayesian inference for sample surveys Roderick Little Module : Bayesian models for simple random samples Superpopulation Modeling: Estimating parameters Various principles: least squares, method of moments,

More information