Introduction to Regression

Size: px
Start display at page:

Download "Introduction to Regression"

Transcription

1 Introduction to Regression p. 1/97 Introduction to Regression Chad Schafer Carnegie Mellon University

2 Introduction to Regression p. 1/97 Acknowledgement Larry Wasserman, All of Nonparametric Statistics, Springer, 2006

3 Introduction to Regression p. 2/97 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Smoothers Cross Validation Local Polynomial Regression Confidence Bands Basis Methods: Splines Multiple Regression

4 Introduction to Regression p. 3/97 Basic Concepts in Regression The Regression Problem: Observe (X 1,Y 1 ),...,(X n,y n ), estimate f(x) = E(Y X = x). Equivalently: where E(ǫ i ) = 0. Y i = f(x i ) + ǫ i Simple estimators: f(x) = β 0 + β 1 x f(x) = mean{y i : X i x h} parametric nonparametric

5 Introduction to Regression p. 3/97 Next... We will first quickly look at some simple examples from astronomy

6 Introduction to Regression p. 4/97 Example: Type Ia Supernovae Distance Modulus Redshift From Riess, et al. (2007), 182 Type Ia Supernovae

7 Introduction to Regression p. 5/97 Example: Type Ia Supernovae Assumption: Observed pairs (z i,y i ) are realizations of Y i = f(z i ) + σ i ǫ i, where the ǫ i are i.i.d. standard normal with mean zero. Error standard deviations σ i assumed known. Objective: Estimate f( ).

8 Introduction to Regression p. 6/97 Example: Type Ia Supernovae Distance Modulus Fourth order polynomial fit. Redshift

9 Introduction to Regression p. 7/97 Example: Type Ia Supernovae Distance Modulus Fifth order polynomial fit. Redshift

10 Introduction to Regression p. 8/97 Example: Type Ia Supernovae Distance Modulus Nonparametric estimate. Redshift

11 Introduction to Regression p. 9/97 Example: Type Ia Supernovae ΛCDM, two parameter model: ( ) c(1 + z) z du f(z) = 5 log 10 H 0 Ωm (1 + u) 3 + (1 Ω m )

12 Introduction to Regression p. 10/97 Example: Type Ia Supernovae Distance Modulus Redshift Estimate where H 0 = and Ω m = (the MLE)

13 Introduction to Regression p. 11/97 Example: Dark Energy Y i = f(z i ) + ǫ i, i = 1,...,n Y i = luminosity of i th supernova z i = redshift of i th supernova Want to estimate equation of state w(z): where w = T(f,f,f ) w(z) = 1 + z 3 3H0 2Ω M(1 + z) f (z) (f (z)) 3 H0 2Ω M(1 + z) 3 1. (f (z)) 2

14 Introduction to Regression p. 11/97 Next... We will now discuss the fundamental challenge of constructing a regression model: How complex should it be?

15 Introduction to Regression p. 12/97 The Bias Variance Tradeoff Must choose model to achieve balance between too simple high bias, i.e., E( f(x)) not close to f(x) xxxxxx precise, but not accurate too complex high variance, i.e., Var( f(x)) large xxxxxx accurate, but not precise This is the classic bias variance tradeoff. Could be called the accuracy precision tradeoff.

16 Introduction to Regression p. 13/97 The Bias Variance Tradeoff Linear Regression Model: Fit models of the form p f(x) = β 0 + β i g i (x) i=1 where g i ( ) are fixed, specified functions. Increasing p yields more complex model

17 Introduction to Regression p. 14/97 The Bias Variance Tradeoff Nonparametric Case: Every nonparametric procedure requires a smoothing parameter h. Consider the regression estimator based on local averaging: f(x) = mean{y i : X i x h}. Increasing h yields smoother estimate f

18 Introduction to Regression p. 15/97 The Bias Variance Tradeoff Can formalize this via appropriate measure of error in estimator Squared error loss: L(f(x), f(x)) = (f(x) f(x)) 2. Mean squared error MSE (risk) MSE = R(f(x), f(x)) = E(L(f(x), f(x))).

19 Introduction to Regression p. 16/97 The Bias Variance Tradeoff The key result: where R(f(x), f(x)) = bias 2 x + variance x bias x variance x = E( f(x)) f(x) = Var( f(x)) Often written as MSE = BIAS 2 + VARIANCE.

20 Introduction to Regression p. 17/97 The Bias Variance Tradeoff Mean Integrated Squared Error: MISE = = R(f(x), f(x)) dx bias 2 x dx + variance x dx Empirical version: 1 n n i=1 R(f(x i ), f(x i )).

21 Introduction to Regression p. 18/97 The Bias Variance Tradeoff Risk Bias squared Variance Less Optimal smoothing More

22 Introduction to Regression p. 19/97 The Bias Variance Tradeoff For nonparametric procedures, when x is one-dimensional, which is minimized at Hence, MISE c 1 h 4 + c 2 nh h = O MISE = O ( 1 n 1/5 ( 1 ) n 4/5 whereas, for parametric problems ( ) 1 MISE = O n )

23 Introduction to Regression p. 19/97 Next... The term nonparametric is confusing. People ask: Is this not just a parametric model where there are many more parameters?

24 Introduction to Regression p. 20/97 What is nonparametric? Data Estimate Assumptions In the parametric case, the influence of assumptions is fixed

25 Introduction to Regression p. 21/97 What is nonparametric? Data Estimate Assumptions h In the nonparametric case, the influence of assumptions is controlled by h, where h = O ( 1 n 1/5 )

26 Introduction to Regression p. 22/97 Regression Parametric Regression: Y Y = β 0 + β 1 X + ǫ = β 0 + β 1 X β d X d + ǫ Y = β 0 e β 1X 1 + β 2 X 2 + ǫ Nonparametric Regression: Y Y Y = f(x) + ǫ = f 1 (X 1 ) + + f d (X d ) + ǫ = f(x 1,X 2 ) + ǫ

27 Introduction to Regression p. 22/97 Next... Now we will begin to discuss different procedures for fitting regression models, and some elements they have in common. Namely, they are all linear smoothers.

28 Introduction to Regression p. 23/97 Regression Methods: Linear Regression Binning Local averaging Kernels Local polynomials Splines

29 Introduction to Regression p. 24/97 Linear Smoothers All are linear smoothers: f(x) = i Y i l i (x) for some weights: l 1 (x),...,l n (x) The vector of weights depends on the target point x. Each method involves specification of a parameter to control model smoothness/complexity (either p or h)

30 Introduction to Regression p. 25/97 Linear Smoothers If f(x) = n i=1 l i(x)y i then 0 1 by 1. C A byn {z } by 0 by = L Y bf(x 1 ).. bf(xn) 0 1 = C A The effective degrees of freedom is: l 1 (X 1 ) l 2 (X 1 ) ln(x 1 ) l 1 (X 2 ) l 2 (X 2 ) ln(x 2 ).... l 1 (Xn) l 2 (Xn) ln(xn) {z } ν = trace(l) = L.. n L ii. i= B A Y C A Yn {z } Y Can be interpreted as the number of parameters in the model.

31 Introduction to Regression p. 26/97 Linear Smoothers Consider the Extreme Cases: When l i (X j ) = 1/n for all i,j, and ν = 1. f(x j ) = Y for all j When l i (X j ) = δ ij for all i,j, and ν = n. f(x j ) = Y j

32 Introduction to Regression p. 26/97 Next... Linear regression models are the classic approach to regression. Here, the response is modelled as a linear combination of p selected predictors.

33 Introduction to Regression p. 27/97 Linear Regression Assume that p Y = β 0 + β i g i (x) + ǫ i=1 where g i ( ) are specified functions, not estimated. Assume that E(ǫ) = 0, Var(ǫ) = σ 2 Often assume ǫ is Gaussian, but this is not essential

34 Introduction to Regression p. 28/97 Linear Regression The Design Matrix: D = 1 g 1 (X 1 ) g 2 (X 1 ) g p (X 1 ) 1 g 1 (X 2 ) g 2 (X 2 ) g p (X 2 ) g 1 (X n ) g 2 (X n ) g p (X n ) Then, L = D(D T D) 1 D T is the projection matrix. Note that ν = trace(l) = p + 1

35 Introduction to Regression p. 29/97 Linear Regression Let Y be a n-vector filled with Y 1,Y 2,...,Y n. The Least Squares Estimates of β: β = (D T D) 1 D T Y The Fitted Values: Ŷ = LY = D β The Residuals: ǫ = Y Ŷ = (I L)Y

36 Introduction to Regression p. 30/97 Linear Regression The estimates of β given above minimize the sum of squared errors: SSE = n ǫ 2 i = i=1 n i=1 ( ) 2 Y i Ŷi If ǫ i are Gaussian, then β is Gaussian, leading to confidence intervals, hypothesis tests

37 Introduction to Regression p. 31/97 Linear Regression in R Linear regression in R is done via the command lm() > holdlm = lm(y x) Default is to include intercept term. If want to exclude, use > holdlm = lm(y x - 1)

38 Introduction to Regression p. 32/97 Linear Regression in R y x

39 Introduction to Regression p. 33/97 Linear Regression in R Fitting the model Y = β 0 + β 1 x + ǫ The summary() shows the key output > holdlm = lm(y x) > summary(holdlm) < CUT SOME HERE > Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) * x e-10 *** < CUT SOME HERE > Note that β 1 = 2.07, the SE for β 1 is approximately 0.17.

40 Introduction to Regression p. 34/97 Linear Regression in R Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) * x e-10 *** If the ǫ are Gaussian, then the last column gives the p-value for the test of H 0 : β i = 0 versus H 1 : β i 0. Also, if the ǫ are Gaussian, β i ± 2 ŜE( β i ) is an approximate 95% confidence interval for β i.

41 Introduction to Regression p. 35/97 Linear Regression in R Residuals Fitted Values Always a good idea to look at plot of residuals versus fitted values. There should be no apparent pattern. > plot(holdlm$residuals,holdlm$fitted.values)

42 Introduction to Regression p. 36/97 Linear Regression in R Sample Quantiles Theoretical Quantiles Is it reasonable to assume ǫ are Gaussian? Check the normal probability plot of the residuals. > qqplot(holdlm$residuals) > qqline(holdlm$residuals)

43 Introduction to Regression p. 37/97 Linear Regression in R Simple to include more terms in the model: > holdlm = lm(y x + z) Unfortunately, this does not work: > holdlm = lm(y x + xˆ2) Instead, create a new variable and use that: > x2 = xˆ2 > holdlm = lm(y x + x2)

44 Introduction to Regression p. 37/97 Next... Now we will begin our discussion of nonparametric regression. To motivate the discussion, we start with a simple, but naive, approach: binning.

45 Introduction to Regression p. 38/97 Binning Divide the x-axis into bins B 1,B 2,... of width h. f is a step function based on averaging the Y i s in each bin: { for x B j : f(x) = mean Y i : X i B j }. The (arbitrary) choice of the boundaries of the bins can affect inference, especially when h large.

46 Introduction to Regression p. 39/97 Binning WMAP data, binned estimate with h = 15. Cl l

47 Introduction to Regression p. 40/97 Binning WMAP data, binned estimate with h = 50. Cl l

48 Introduction to Regression p. 41/97 Binning WMAP data, binned estimate with h = 75. Cl l

49 Introduction to Regression p. 41/97 Next... A step beyond binning is to take local averages to estimate the regression function.

50 Introduction to Regression p. 42/97 Local Averaging For each x let N x = [ x h 2, x + h ]. 2 This is a moving window of length h, centered at x. Define } f(x) = mean {Y i : X i N x. This is like binning but removes the arbitrary boundaries.

51 Introduction to Regression p. 43/97 Local Averaging WMAP data, local average estimate with h = 15. Cl Binned Local Averaging l

52 Introduction to Regression p. 44/97 Local Averaging WMAP data, local average estimate with h = 50. Cl Binned Local Averaging l

53 Introduction to Regression p. 45/97 Local Averaging WMAP data, local average estimate with h = 75. Cl Binned Local Averaging l

54 Introduction to Regression p. 45/97 Next... Local averaging is a special case of kernel regression.

55 Introduction to Regression p. 46/97 Kernel Regression The local average estimator can be written: n i=1 f(x) = Y i K ( x X i ) h n i=1 K ( x X i ) h where K(x) = { 1 x < 1/2 0 otherwise. Can improve this by using a function K which is smoother.

56 Introduction to Regression p. 47/97 Kernels A kernel is any smooth function K such that K(x) 0 and K(x)dx = 1, xk(x)dx = 0 and σk 2 x 2 K(x)dx > 0. Some commonly used kernels are the following: the boxcar kernel : K(x) = 1 I( x < 1), 2 the Gaussian kernel : K(x) = 1 e x2 /2, 2π the Epanechnikov kernel : K(x) = 3 4 (1 x2 )I( x < 1) the tricube kernel : K(x) = (1 x 3 ) 3 I( x < 1).

57 Introduction to Regression p. 48/97 Kernels

58 Introduction to Regression p. 49/97 Kernel Regression We can write this as f(x) = n i=1 Y i K ( x X i ) h n i=1 K ( x X i ) h f(x) = i Y i l i (x) where l i (x) = K ( ) x X i h n i=1 K ( x X i ). h

59 Introduction to Regression p. 50/97 Kernel Regression WMAP data, kernel regression estimates, h = 15. Cl boxcar kernel Gaussian kernel tricube kernel l

60 Introduction to Regression p. 51/97 Kernel Regression WMAP data, kernel regression estimates, h = 50. Cl boxcar kernel Gaussian kernel tricube kernel l

61 Introduction to Regression p. 52/97 Kernel Regression WMAP data, kernel regression estimates, h = 75. Cl boxcar kernel Gaussian kernel tricube kernel l

62 Introduction to Regression p. 53/97 Kernel Regression MISE ( ) 2 ( ) 2 x 2 K(x)dx f (x) + 2f (x) g (x) dx g(x) + σ2 K 2 (x)dx 1 nh g(x) dx. h4 4 where g(x) is the density for X. What is especially notable is the presence of the term 2f (x) g (x) g(x) = design bias. Also, bias is large near the boundary. We can reduce these biases using local polynomials.

63 Introduction to Regression p. 54/97 Kernel Regression WMAP data, h = 50. Note the boundary bias. Cl boxcar kernel Gaussian kernel tricube kernel l

64 Introduction to Regression p. 54/97 Next... The deficiencies of kernel regression can be addressed by considering models which, on small scales, model the regression function as a low order polynomial.

65 Introduction to Regression p. 55/97 Local Polynomial Regression Recall polynomial regression: f(x) = β 0 + β 1 x + β 2 x β p x p where β = ( β 0,..., β p ) are obtained by least squares: n ( minimize Y i [β 0 + β 1 x + β 2 x β p x p ] i=1 ) 2

66 Introduction to Regression p. 56/97 Local Polynomial Regression Local polynomial regression: approximate f(x) locally by a different polynomial for every x: f(u) β 0 (x)+β 1 (x)(u x)+β 2 (x)(u x) 2 + +β p (x)(u x) p for u near x. Estimate ( β 0 (x),..., β p (x)) by local least squares: minimize n ( ) x (Y i [β 0 (x)+β 1 (x)x+β 2 (x)x 2 + +β p (x)x p ]) 2 Xi K h i=1 }{{} kernel f(x) = β 0 (x)

67 Introduction to Regression p. 57/97 Local Polynomial Regression Taking p = 0 yields the kernel regression estimator: f n (x) = l i (x) = n l i (x)y i i=1 K ( x x i ) h ( ). n j=1 K x xj h Taking p = 1 yields the local linear estimator. This is the best, all-purpose smoother. Choice of Kernel K: not important Choice of bandwidth h: crucial

68 Introduction to Regression p. 58/97 Local Polynomial Regression The local polynomial regression estimate is n f n (x) = l i (x)y i i=1 where l(x) T = (l 1 (x),...,l n (x)), l(x) T = e T 1 (Xx T W x X x ) 1 Xx T W x, e 1 = (1, 0,...,0) T and X x and W x are defined by 0 Xx = 1 X 1 x 1 X 2 x Xn x (X 1 x) p p! (X 2 x) p p! (Xn x) p p! 1 0 Wx = B A K x X1 h... « K x X n h C A

69 Introduction to Regression p. 59/97 Local Polynomial Regression Note that f(x) = n i=1 l i(x)y i is a linear smoother. Define Ŷ = (Ŷ1,...,Ŷn) where Ŷi = f(x i ). Then Ŷ = LY where L is the smoothing matrix: L = l 1 (X 1 ) l 2 (X 1 ) ln(x 1 ) l 1 (X 2 ) l 2 (X 2 ) ln(x 2 ).... l 1 (Xn) l 2 (Xn) ln(xn)..... The effective degrees of freedom is: ν = trace(l) = n i=1 L ii.

70 Introduction to Regression p. 59/97 Next... Nonparametric procedures force the choice of the smoothing parameter h. Next we will discuss how this can be selected in a data-driven manner via cross-validation.

71 Introduction to Regression p. 60/97 Choosing the Bandwidth Estimate the risk 1 n n E( f(x i ) f(x i )) 2 i=1 with the leave-one-out cross-validation score: CV = 1 n (Y i n f ( i) (X i )) 2 i=1 where f ( i) is the estimator obtained by omitting the i th pair (X i,y i ).

72 Introduction to Regression p. 61/97 Choosing the Bandwidth Amazing shortcut formula: ( CV = 1 n n i=1 Y i f ) 2 n (x i ). 1 L ii An commonly used approximation is GCV (generalized cross-validation): GCV = 1 n ( 1 ν n n ) 2 (Y i f(x i )) 2 i=1 ν = trace(l).

73 Introduction to Regression p. 62/97 Theoretical Aside Why local linear (p = 1) is better than kernel (p = 0). Both have (approximate) variance σ 2 (x) g(x)nh The kernel estimator has bias Z h f (x) + f (x)g (x) g(x) K 2 (u)du «Z u 2 K(u)du whereas the local linear estimator has asymptotic bias h f (x) Z u 2 K(u)du The local linear estimator is free from design bias. At the boundary points, the kernel estimator has asymptotic bias of O(h) while the local linear estimator has bias O(h 2 ).

74 Introduction to Regression p. 62/97 Next... Next, we will go over a bit of how to perform local polynomial regression in R.

75 Introduction to Regression p. 63/97 Using locfit() Need to include the locfit library: > install.packages("locfit") > library(locfit) > result = locfit(y x, alpha=c(0, 1.5), deg=1) y and x are vectors the second argument to alpha gives the bandwidth (h) the first argument to alpha specifies the nearest neighbor fraction, an alternative to the bandwidth fitted(result) gives the fitted values, f(x i ) residuals(result) gives the residuals, Y i f(x i )

76 Introduction to Regression p. 64/97 Using locfit() degree=1, n=1000, sigma=0.1 y x(1 x)sin((2.1π) (x + 0.2)) estimate using h = estimate using gcv optimal h = x

77 Introduction to Regression p. 65/97 Using locfit() degree=1, n=1000, sigma=0.1 y x(1 x)sin((2.1π) (x + 0.2)) estimate using h = 0.1 estimate using gcv optimal h = x

78 Introduction to Regression p. 66/97 Using locfit() degree=1, n=1000, sigma=0.5 y (x + 5exp( 4x 2 )) estimate using h = 0.05 estimate using gcv optimal h = x

79 Introduction to Regression p. 67/97 Using locfit() degree=1, n=1000, sigma=0.5 y (x + 5exp( 4x 2 )) estimate using h = 0.5 estimate using gcv optimal h = x

80 Introduction to Regression p. 68/97 Example WMAP data, local linear fit, h = 15. y estimate using h = 15 estimate using gcv optimal h = x

81 Introduction to Regression p. 69/97 Example WMAP data, local linear fit, h = 125. y estimate using h = 125 estimate using gcv optimal h = x

82 Introduction to Regression p. 69/97 Next... We should quantify the amount of error in our estimator for the regression function. Next we will look at how this can be done using local polynomial regression.

83 Introduction to Regression p. 70/97 Variance Estimation Let where σ 2 = n i=1 (Y i f(x i )) 2 n 2ν + ν ν = tr(l), ν = tr(l T L) = n l(x i ) 2. If f is sufficiently smooth, then σ 2 is a consistent estimator of σ 2. i=1

84 Introduction to Regression p. 71/97 Variance Estimation For the WMAP data, using local linear fit. > wmap = read.table("wmap.dat",header=t) > opth = 63.0 > locfitwmap = locfit(wmap$cl[1:700] wmap$ell[1:700], alpha=c(0,opth),deg=1) > nu = as.numeric(locfitwmap$dp[6]) > nutilde = as.numeric(locfitwmap$dp[7]) > sigmasqrhat = sum(residuals(locfitwmap)ˆ2)/(700-2*nu+nutilde) > sigmasqrhat [1] But, does not seem reasonable to assume homoscedasticity...

85 Introduction to Regression p. 72/97 Variance Estimation Allow σ to be a function of x: Y i = f(x i ) + σ(x i )ǫ i. Let Z i = log(y i f(x i )) 2 and δ i = log ǫ 2 i. Then, Z i = log(σ 2 (x i )) + δ i. This suggests estimating log σ 2 (x) by regressing the log squared residuals on x.

86 Introduction to Regression p. 73/97 Variance Estimation 1. Estimate f(x) with any nonparametric method to get an estimate f n (x). 2. Define Z i = log(y i f n (x i )) Regress the Z i s on the x i s (again using any nonparametric method) to get an estimate q(x) of log σ 2 (x) and let σ 2 (x) = e bq(x).

87 Introduction to Regression p. 74/97 Example WMAP data, log squared residuals, local linear fit, h = log residual squared l

88 Introduction to Regression p. 75/97 Example With local linear fit, h = 130, chosen via GCV log residual squared l

89 Introduction to Regression p. 76/97 Example Estimating σ(x): sigma hat l

90 Introduction to Regression p. 77/97 Confidence Bands Recall that so f(x) = i Var( f(x)) = i Y i l i (x) σ 2 (X i )l 2 i(x). An approximate 1 α confidence interval for f(x) is f(x) ± z α/2 i σ 2 (X i )l 2 i (x). When σ(x) is smooth, we can approximate σ 2 (X i )l 2 i (x) σ(x) l(x). i

91 Introduction to Regression p. 78/97 Confidence Bands Two caveats: 1. f is biased so this is realy an interval for E( f(x)). Result: bands can miss sharp peaks in the function. 2. Pointwise coverage does not imply simultaneous coverage for all x. Solution for 2 is to replace z α/2 with a larger number (Sun and Loader 1994). locfit() does this for you.

92 Introduction to Regression p. 79/97 More on locfit() > diaghat = predict.locfit(locfitwmap,where="data",what="infl") > normell = predict.locfit(locfitwmap,where="data",what="vari") diaghat will be L ii, i = 1, 2,...,n. normell will be l(x i ), i = 1, 2,...,n The Sun and Loader replacement for z α/2 is found using kappa0(locfitwmap)$crit.val.

93 Introduction to Regression p. 80/97 Example 95% pointwise confidence bands: Cl estimate 95% pointwise band assuming constant variance 95% pointwise band using nonconstant variance l

94 Introduction to Regression p. 81/97 Example 95% simultaneous confidence bands: Cl estimate 95% simultaneous band assuming constant variance 95% simultaneous band using nonconstant variance l

95 Introduction to Regression p. 81/97 Next... We discuss the use of regression splines. These are a well-motivated choice of basis for constructing regression functions.

96 Introduction to Regression p. 82/97 Basis Methods Idea: expand f as f(x) = j β j ψ j (x) where ψ 1 (x),ψ 2 (x),... are specially chosen, known functions. Then estimate β j and set f(x) = j β j ψ j (x). Here we consider a popular version, splines

97 Introduction to Regression p. 83/97 Splines and Penalization Define f n to be the function that minimizes M(λ) = (Y i f n (x i )) 2 + λ (f (x)) 2 dx. i λ = 0 = f n (X i ) = Y i (no smoothing) λ = = f n (x) = β 0 + β 1 x (linear) 0 < λ < = f n (x) = cubic spline with knots at X i. A cubic spline is a continuous function f such that 1. f is a cubic polynomial between the X i s 2. f has continuous first and second derivatives at the X i s.

98 Introduction to Regression p. 84/97 Basis For Splines Define (z) + = max{z, 0}, N = n + 4, ψ 1 (x) = 1 ψ 2 (x) = x ψ 3 (x) = x 2 ψ 4 (x) = x 3 ψ 5 (x) = (x X 1 ) 3 + ψ 6 (x) = (x X 2 ) 3 + ψ N (x) = (x X n ) 3 +. These functions form a basis for the splines: we can write f(x) = N β j ψ j (x). j=1 (For numerical calculations it is actually more efficient to use other spline bases.) We can thus write (1) f n (x) = N β j ψ j (x), j=1

99 Introduction to Regression p. 85/97 Basis For Splines We can now rewrite the minimization as follows: minimize : (Y Ψβ) T (Y Ψβ) + λβ T Ωβ where Ψ ij = ψ j (X i ) and Ω jk = ψ j (x)ψ k (x)dx. The value of β that minimizes this is β = (Ψ T Ψ + λω) 1 Ψ T Y. The smoothing spline f n (x) is a linear smoother, that is, there exist weights l(x) such that f n (x) = n i=1 Y il i (x).

100 Basis For Splines Basis functions with 5 knots. Psi x Introduction to Regression p. 86/97

101 Introduction to Regression p. 87/97 Basis For Splines Same span as previous slide, the B-spline basis, 5 knots: x

102 Introduction to Regression p. 88/97 Smoothing Splines in R > smosplresult = smooth.spline(x,y, cv=false, all.knots=true) If cv=true, then cross-validiation used to choose λ; if cv=false, gcv is used If all.knots=true, then knots are placed at all data points; if all.knots=false, then set nknots to specify the number of knots to be used. Using fewer knots eases the computational cost. predict(smosplresult) gives fitted values.

103 Introduction to Regression p. 89/97 Example WMAP data, λ chosen using GCV. Cl l

104 Introduction to Regression p. 89/97 Next... Finally, we address the challenges of fitting regression models when the predictor x is of large dimension.

105 Introduction to Regression p. 90/97 Recall: The Bias Variance Tradeoff For nonparametric procedures, when x is one-dimensional, which is minimized at Hence, MISE c 1 h 4 + c 2 nh h = O MISE = O ( 1 n 1/5 ( 1 ) n 4/5 whereas, for parametric problems ( ) 1 MISE = O n )

106 Introduction to Regression p. 91/97 The Curse of Dimensionality For nonparametric procedures, when x is d-dimensional, which is minimized at Hence, MISE c 1 h 4 + c 2 nh d h = O ( MISE = O 1 n 1/(4+d) ( 1 ) n 4/(4+d) whereas, for parametric problems, still, ( ) 1 MISE = O n )

107 Introduction to Regression p. 92/97 Multiple Regression Y = f(x 1,X 2,...,X d ) + ǫ The curse of dimensionality: Optimal rate of convergence for d = 1 is n 4/5. In d dimensions the optimal rate of convergence is is n 4/(4+d). Thus, the sample size m required for a d-dimensional problem to have the same accuracy as a sample size n in a one-dimensional problem is m n cd where c = (4 + d)/(5d) > 0. To maintain a given degree of accuracy of an estimator, the sample size must increase exponentially with the dimension d. Put another way, confidence bands get very large as the dimension d increases.

108 Introduction to Regression p. 93/97 Multiple Local Linear Given a nonsingular positive definite d d bandwidth matrix H, we define K H (x) = 1 H 1/2K(H 1/2 x). Often, one scales each covariate to have the same mean and variance and then we use the kernel h d K( x /h) where K is any one-dimensional kernel. Then there is a single bandwidth parameter h.

109 Introduction to Regression p. 94/97 Multiple Local Linear At a target value x = (x 1,...,x d ) T, the local sum of squares is given by n i=1 ( w i (x) Y i a 0 d j=1 ) 2 a j (x ij x j ) where w i (x) = K( x i x /h). The estimator is f n (x) = â 0 â = (â 0,...,â d ) T is the value of a = (a 0,...,a d ) T that minimizes the weighted sums of squares.

110 Introduction to Regression p. 95/97 Multiple Local Linear The solution â is â = (X T x W x X x ) 1 X T x W x Y where X x = 1 (x 11 x 1 ) (x 1d x d ) 1 (x 21 x 1 ) (x 2d x d ) (x n1 x 1 ) (x nd x d ) and W x is the diagonal matrix whose (i,i) element is w i (x).

111 Introduction to Regression p. 96/97 A Compromise: Additive Models Y = α + d j=1 f j (x j ) + ǫ Usually take α = Y. Then estimate the f j s by backfitting. 1. set α = Y, f 1 = f d = Iterate until convergence: for j = 1,...,d: Compute Ỹi = Y i α k j f k (X i ), i = 1,...,n. Apply a smoother to Ỹi on X j to obtain f j. Set f j (x) equal to f j (x) n 1 n i=1 f j (x i ). The last step ensures that i f j (X i ) = 0 (identifiability).

112 Introduction to Regression p. 97/97 To Sum Up... Classic linear regression is computationally and theoretically simple Nonparametric procedures for regression offer increased flexibility Careful consideration of MISE suggests use of local polynomial fits Smoothing parameters can be chosen objectively High-dimensional data present a challenge

Introduction to Regression

Introduction to Regression Introduction to Regression Chad M. Schafer May 20, 2015 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression Nonparametric Procedures Cross Validation Local Polynomial Regression

More information

Introduction to Regression

Introduction to Regression Introduction to Regression Chad M. Schafer cschafer@stat.cmu.edu Carnegie Mellon University Introduction to Regression p. 1/100 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression

More information

Introduction to Regression

Introduction to Regression Introduction to Regression David E Jones (slides mostly by Chad M Schafer) June 1, 2016 1 / 102 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression Nonparametric Procedures

More information

Nonparametric Regression. Badr Missaoui

Nonparametric Regression. Badr Missaoui Badr Missaoui Outline Kernel and local polynomial regression. Penalized regression. We are given n pairs of observations (X 1, Y 1 ),...,(X n, Y n ) where Y i = r(x i ) + ε i, i = 1,..., n and r(x) = E(Y

More information

Nonparametric Estimation of Luminosity Functions

Nonparametric Estimation of Luminosity Functions x x Nonparametric Estimation of Luminosity Functions Chad Schafer Department of Statistics, Carnegie Mellon University cschafer@stat.cmu.edu 1 Luminosity Functions The luminosity function gives the number

More information

Nonparametric Inference in Cosmology and Astrophysics: Biases and Variants

Nonparametric Inference in Cosmology and Astrophysics: Biases and Variants Nonparametric Inference in Cosmology and Astrophysics: Biases and Variants Christopher R. Genovese Department of Statistics Carnegie Mellon University http://www.stat.cmu.edu/ ~ genovese/ Collaborators:

More information

Introduction to Regression

Introduction to Regression PSU Summer School in Statistics Regression Slide 1 Introduction to Regression Chad M. Schafer Department of Statistics & Data Science, Carnegie Mellon University cschafer@cmu.edu June 2018 PSU Summer School

More information

Local regression I. Patrick Breheny. November 1. Kernel weighted averages Local linear regression

Local regression I. Patrick Breheny. November 1. Kernel weighted averages Local linear regression Local regression I Patrick Breheny November 1 Patrick Breheny STA 621: Nonparametric Statistics 1/27 Simple local models Kernel weighted averages The Nadaraya-Watson estimator Expected loss and prediction

More information

Nonparametric Methods

Nonparametric Methods Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis

More information

41903: Introduction to Nonparametrics

41903: Introduction to Nonparametrics 41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific

More information

Time Series and Forecasting Lecture 4 NonLinear Time Series

Time Series and Forecasting Lecture 4 NonLinear Time Series Time Series and Forecasting Lecture 4 NonLinear Time Series Bruce E. Hansen Summer School in Economics and Econometrics University of Crete July 23-27, 2012 Bruce Hansen (University of Wisconsin) Foundations

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.1 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Nonparametric Statistics

Nonparametric Statistics Nonparametric Statistics Jessi Cisewski Yale University Astrostatistics Summer School - XI Wednesday, June 3, 2015 1 Overview Many of the standard statistical inference procedures are based on assumptions

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Econ 582 Nonparametric Regression

Econ 582 Nonparametric Regression Econ 582 Nonparametric Regression Eric Zivot May 28, 2013 Nonparametric Regression Sofarwehaveonlyconsideredlinearregressionmodels = x 0 β + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β The assume

More information

Nonparametric Regression. Changliang Zou

Nonparametric Regression. Changliang Zou Nonparametric Regression Institute of Statistics, Nankai University Email: nk.chlzou@gmail.com Smoothing parameter selection An overall measure of how well m h (x) performs in estimating m(x) over x (0,

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University

More information

Generalized Additive Models

Generalized Additive Models Generalized Additive Models The Model The GLM is: g( µ) = ß 0 + ß 1 x 1 + ß 2 x 2 +... + ß k x k The generalization to the GAM is: g(µ) = ß 0 + f 1 (x 1 ) + f 2 (x 2 ) +... + f k (x k ) where the functions

More information

Modelling Non-linear and Non-stationary Time Series

Modelling Non-linear and Non-stationary Time Series Modelling Non-linear and Non-stationary Time Series Chapter 2: Non-parametric methods Henrik Madsen Advanced Time Series Analysis September 206 Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Linear Regression Model. Badr Missaoui

Linear Regression Model. Badr Missaoui Linear Regression Model Badr Missaoui Introduction What is this course about? It is a course on applied statistics. It comprises 2 hours lectures each week and 1 hour lab sessions/tutorials. We will focus

More information

Local Polynomial Regression

Local Polynomial Regression VI Local Polynomial Regression (1) Global polynomial regression We observe random pairs (X 1, Y 1 ),, (X n, Y n ) where (X 1, Y 1 ),, (X n, Y n ) iid (X, Y ). We want to estimate m(x) = E(Y X = x) based

More information

Introduction. Linear Regression. coefficient estimates for the wage equation: E(Y X) = X 1 β X d β d = X β

Introduction. Linear Regression. coefficient estimates for the wage equation: E(Y X) = X 1 β X d β d = X β Introduction - Introduction -2 Introduction Linear Regression E(Y X) = X β +...+X d β d = X β Example: Wage equation Y = log wages, X = schooling (measured in years), labor market experience (measured

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Remedial Measures for Multiple Linear Regression Models

Remedial Measures for Multiple Linear Regression Models Remedial Measures for Multiple Linear Regression Models Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Remedial Measures for Multiple Linear Regression Models 1 / 25 Outline

More information

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance

More information

Nonparametric Regression 10/ Larry Wasserman

Nonparametric Regression 10/ Larry Wasserman Nonparametric Regression 10/36-702 Larry Wasserman 1 Introduction Now we focus on the following problem: Given a sample (X 1, Y 1 ),..., (X n, Y n ), where X i R d and Y i R, estimate the regression function

More information

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive

More information

Announcements. Proposals graded

Announcements. Proposals graded Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:

More information

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas 0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity

More information

12 - Nonparametric Density Estimation

12 - Nonparametric Density Estimation ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Quantitative Economics for the Evaluation of the European Policy. Dipartimento di Economia e Management

Quantitative Economics for the Evaluation of the European Policy. Dipartimento di Economia e Management Quantitative Economics for the Evaluation of the European Policy Dipartimento di Economia e Management Irene Brunetti 1 Davide Fiaschi 2 Angela Parenti 3 9 ottobre 2015 1 ireneb@ec.unipi.it. 2 davide.fiaschi@unipi.it.

More information

Kernel density estimation

Kernel density estimation Kernel density estimation Patrick Breheny October 18 Patrick Breheny STA 621: Nonparametric Statistics 1/34 Introduction Kernel Density Estimation We ve looked at one method for estimating density: histograms

More information

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff IEOR 165 Lecture 7 Bias-Variance Tradeoff 1 Bias-Variance Tradeoff Consider the case of parametric regression with β R, and suppose we would like to analyze the error of the estimate ˆβ in comparison to

More information

Nonparametric Inference via Bootstrapping the Debiased Estimator

Nonparametric Inference via Bootstrapping the Debiased Estimator Nonparametric Inference via Bootstrapping the Debiased Estimator Yen-Chi Chen Department of Statistics, University of Washington ICSA-Canada Chapter Symposium 2017 1 / 21 Problem Setup Let X 1,, X n be

More information

Recap. HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis:

Recap. HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis: 1 / 23 Recap HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis: Pr(G = k X) Pr(X G = k)pr(g = k) Theory: LDA more

More information

3 Nonparametric Density Estimation

3 Nonparametric Density Estimation 3 Nonparametric Density Estimation Example: Income distribution Source: U.K. Family Expenditure Survey (FES) 1968-1995 Approximately 7000 British Households per year For each household many different variables

More information

Motivational Example

Motivational Example Motivational Example Data: Observational longitudinal study of obesity from birth to adulthood. Overall Goal: Build age-, gender-, height-specific growth charts (under 3 year) to diagnose growth abnomalities.

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Day 4A Nonparametrics

Day 4A Nonparametrics Day 4A Nonparametrics A. Colin Cameron Univ. of Calif. - Davis... for Center of Labor Economics Norwegian School of Economics Advanced Microeconometrics Aug 28 - Sep 2, 2017. Colin Cameron Univ. of Calif.

More information

Sliced Inverse Regression

Sliced Inverse Regression Sliced Inverse Regression Ge Zhao gzz13@psu.edu Department of Statistics The Pennsylvania State University Outline Background of Sliced Inverse Regression (SIR) Dimension Reduction Definition of SIR Inversed

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Boundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta

Boundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta Boundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta November 29, 2007 Outline Overview of Kernel Density

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression September 24, 2008 Reading HH 8, GIll 4 Simple Linear Regression p.1/20 Problem Data: Observe pairs (Y i,x i ),i = 1,...n Response or dependent variable Y Predictor or independent

More information

ECON The Simple Regression Model

ECON The Simple Regression Model ECON 351 - The Simple Regression Model Maggie Jones 1 / 41 The Simple Regression Model Our starting point will be the simple regression model where we look at the relationship between two variables In

More information

Additive Isotonic Regression

Additive Isotonic Regression Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive

More information

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow) STAT40 Midterm Exam University of Illinois Urbana-Champaign October 19 (Friday), 018 3:00 4:15p SOLUTIONS (Yellow) Question 1 (15 points) (10 points) 3 (50 points) extra ( points) Total (77 points) Points

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

Bickel Rosenblatt test

Bickel Rosenblatt test University of Latvia 28.05.2011. A classical Let X 1,..., X n be i.i.d. random variables with a continuous probability density function f. Consider a simple hypothesis H 0 : f = f 0 with a significance

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

Lecture Notes 15 Prediction Chapters 13, 22, 20.4.

Lecture Notes 15 Prediction Chapters 13, 22, 20.4. Lecture Notes 15 Prediction Chapters 13, 22, 20.4. 1 Introduction Prediction is covered in detail in 36-707, 36-701, 36-715, 10/36-702. Here, we will just give an introduction. We observe training data

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Day 3B Nonparametrics and Bootstrap

Day 3B Nonparametrics and Bootstrap Day 3B Nonparametrics and Bootstrap c A. Colin Cameron Univ. of Calif.- Davis Frontiers in Econometrics Bavarian Graduate Program in Economics. Based on A. Colin Cameron and Pravin K. Trivedi (2009,2010),

More information

Nonparametric Econometrics

Nonparametric Econometrics Applied Microeconometrics with Stata Nonparametric Econometrics Spring Term 2011 1 / 37 Contents Introduction The histogram estimator The kernel density estimator Nonparametric regression estimators Semi-

More information

Inference in Regression Analysis

Inference in Regression Analysis Inference in Regression Analysis Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 1 Today: Normal Error Regression Model Y i = β 0 + β 1 X i + ǫ i Y i value

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Adaptive Nonparametric Density Estimators

Adaptive Nonparametric Density Estimators Adaptive Nonparametric Density Estimators by Alan J. Izenman Introduction Theoretical results and practical application of histograms as density estimators usually assume a fixed-partition approach, where

More information

Model-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego

Model-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego Model-free prediction intervals for regression and autoregression Dimitris N. Politis University of California, San Diego To explain or to predict? Models are indispensable for exploring/utilizing relationships

More information

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Lecture 2. The Simple Linear Regression Model: Matrix Approach Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

probability of k samples out of J fall in R.

probability of k samples out of J fall in R. Nonparametric Techniques for Density Estimation (DHS Ch. 4) n Introduction n Estimation Procedure n Parzen Window Estimation n Parzen Window Example n K n -Nearest Neighbor Estimation Introduction Suppose

More information

Confidence intervals for kernel density estimation

Confidence intervals for kernel density estimation Stata User Group - 9th UK meeting - 19/20 May 2003 Confidence intervals for kernel density estimation Carlo Fiorio c.fiorio@lse.ac.uk London School of Economics and STICERD Stata User Group - 9th UK meeting

More information

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Kuangyu Wen & Ximing Wu Texas A&M University Info-Metrics Institute Conference: Recent Innovations in Info-Metrics October

More information

Frontier estimation based on extreme risk measures

Frontier estimation based on extreme risk measures Frontier estimation based on extreme risk measures by Jonathan EL METHNI in collaboration with Ste phane GIRARD & Laurent GARDES CMStatistics 2016 University of Seville December 2016 1 Risk measures 2

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Penalized Splines, Mixed Models, and Recent Large-Sample Results

Penalized Splines, Mixed Models, and Recent Large-Sample Results Penalized Splines, Mixed Models, and Recent Large-Sample Results David Ruppert Operations Research & Information Engineering, Cornell University Feb 4, 2011 Collaborators Matt Wand, University of Wollongong

More information

Introduction to Nonparametric Regression

Introduction to Nonparametric Regression Introduction to Nonparametric Regression Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 04-Jan-2017 Nathaniel E. Helwig (U of Minnesota)

More information

Spatial Process Estimates as Smoothers: A Review

Spatial Process Estimates as Smoothers: A Review Spatial Process Estimates as Smoothers: A Review Soutir Bandyopadhyay 1 Basic Model The observational model considered here has the form Y i = f(x i ) + ɛ i, for 1 i n. (1.1) where Y i is the observed

More information

Inference on distributions and quantiles using a finite-sample Dirichlet process

Inference on distributions and quantiles using a finite-sample Dirichlet process Dirichlet IDEAL Theory/methods Simulations Inference on distributions and quantiles using a finite-sample Dirichlet process David M. Kaplan University of Missouri Matt Goldman UC San Diego Midwest Econometrics

More information

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2 Nearest Neighbor Machine Learning CSE546 Kevin Jamieson University of Washington October 26, 2017 2017 Kevin Jamieson 2 Some data, Bayes Classifier Training data: True label: +1 True label: -1 Optimal

More information

Lecture 14 Simple Linear Regression

Lecture 14 Simple Linear Regression Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent

More information

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Han Liu John Lafferty Larry Wasserman Statistics Department Computer Science Department Machine Learning Department Carnegie Mellon

More information

Smooth simultaneous confidence bands for cumulative distribution functions

Smooth simultaneous confidence bands for cumulative distribution functions Journal of Nonparametric Statistics, 2013 Vol. 25, No. 2, 395 407, http://dx.doi.org/10.1080/10485252.2012.759219 Smooth simultaneous confidence bands for cumulative distribution functions Jiangyan Wang

More information

Linear Regression. 1 Introduction. 2 Least Squares

Linear Regression. 1 Introduction. 2 Least Squares Linear Regression 1 Introduction It is often interesting to study the effect of a variable on a response. In ANOVA, the response is a continuous variable and the variables are discrete / categorical. What

More information

Nonparametric Density Estimation

Nonparametric Density Estimation Nonparametric Density Estimation Econ 690 Purdue University Justin L. Tobias (Purdue) Nonparametric Density Estimation 1 / 29 Density Estimation Suppose that you had some data, say on wages, and you wanted

More information

Introduction to Curve Estimation

Introduction to Curve Estimation Introduction to Curve Estimation Density 0.000 0.002 0.004 0.006 700 800 900 1000 1100 1200 1300 Wilcoxon score Michael E. Tarter & Micheal D. Lock Model-Free Curve Estimation Monographs on Statistics

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

Estimation of cumulative distribution function with spline functions

Estimation of cumulative distribution function with spline functions INTERNATIONAL JOURNAL OF ECONOMICS AND STATISTICS Volume 5, 017 Estimation of cumulative distribution function with functions Akhlitdin Nizamitdinov, Aladdin Shamilov Abstract The estimation of the cumulative

More information

Ch. 5 Transformations and Weighting

Ch. 5 Transformations and Weighting Outline Three approaches: Ch. 5 Transformations and Weighting. Variance stabilizing transformations; Box-Cox Transformations - Section 5.2; 5.4 2. Transformations to linearize the model - Section 5.3 3.

More information

Addressing the Challenges of Luminosity Function Estimation via Likelihood-Free Inference

Addressing the Challenges of Luminosity Function Estimation via Likelihood-Free Inference Addressing the Challenges of Luminosity Function Estimation via Likelihood-Free Inference Chad M. Schafer www.stat.cmu.edu/ cschafer Department of Statistics Carnegie Mellon University June 2011 1 Acknowledgements

More information

Lecture 3: Statistical Decision Theory (Part II)

Lecture 3: Statistical Decision Theory (Part II) Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical

More information

A Nonparametric Monotone Regression Method for Bernoulli Responses with Applications to Wafer Acceptance Tests

A Nonparametric Monotone Regression Method for Bernoulli Responses with Applications to Wafer Acceptance Tests A Nonparametric Monotone Regression Method for Bernoulli Responses with Applications to Wafer Acceptance Tests Jyh-Jen Horng Shiau (Joint work with Shuo-Huei Lin and Cheng-Chih Wen) Institute of Statistics

More information

Semi-Nonparametric Inferences for Massive Data

Semi-Nonparametric Inferences for Massive Data Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information

UNIVERSITÄT POTSDAM Institut für Mathematik

UNIVERSITÄT POTSDAM Institut für Mathematik UNIVERSITÄT POTSDAM Institut für Mathematik Testing the Acceleration Function in Life Time Models Hannelore Liero Matthias Liero Mathematische Statistik und Wahrscheinlichkeitstheorie Universität Potsdam

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR

More information

7 Semiparametric Estimation of Additive Models

7 Semiparametric Estimation of Additive Models 7 Semiparametric Estimation of Additive Models Additive models are very useful for approximating the high-dimensional regression mean functions. They and their extensions have become one of the most widely

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

1. Variance stabilizing transformations; Box-Cox Transformations - Section. 2. Transformations to linearize the model - Section 5.

1. Variance stabilizing transformations; Box-Cox Transformations - Section. 2. Transformations to linearize the model - Section 5. Ch. 5: Transformations and Weighting 1. Variance stabilizing transformations; Box-Cox Transformations - Section 5.2; 5.4 2. Transformations to linearize the model - Section 5.3 3. Weighted regression -

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP

Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP The IsoMAP uses the multiple linear regression and geostatistical methods to analyze isotope data Suppose the response variable

More information

Section 7: Local linear regression (loess) and regression discontinuity designs

Section 7: Local linear regression (loess) and regression discontinuity designs Section 7: Local linear regression (loess) and regression discontinuity designs Yotam Shem-Tov Fall 2015 Yotam Shem-Tov STAT 239/ PS 236A October 26, 2015 1 / 57 Motivation We will focus on local linear

More information

A New Method for Varying Adaptive Bandwidth Selection

A New Method for Varying Adaptive Bandwidth Selection IEEE TRASACTIOS O SIGAL PROCESSIG, VOL. 47, O. 9, SEPTEMBER 1999 2567 TABLE I SQUARE ROOT MEA SQUARED ERRORS (SRMSE) OF ESTIMATIO USIG THE LPA AD VARIOUS WAVELET METHODS A ew Method for Varying Adaptive

More information

Intermediate Econometrics

Intermediate Econometrics Intermediate Econometrics Heteroskedasticity Text: Wooldridge, 8 July 17, 2011 Heteroskedasticity Assumption of homoskedasticity, Var(u i x i1,..., x ik ) = E(u 2 i x i1,..., x ik ) = σ 2. That is, the

More information

DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA

DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA Statistica Sinica 18(2008), 515-534 DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA Kani Chen 1, Jianqing Fan 2 and Zhezhen Jin 3 1 Hong Kong University of Science and Technology,

More information

Nonparametric Inference and the Dark Energy Equation of State

Nonparametric Inference and the Dark Energy Equation of State Nonparametric Inference and the Dark Energy Equation of State Christopher R. Genovese Peter E. Freeman Larry Wasserman Department of Statistics Carnegie Mellon University http://www.stat.cmu.edu/ ~ genovese/

More information