Introduction to Regression
|
|
- Louisa Lamb
- 5 years ago
- Views:
Transcription
1 Introduction to Regression David E Jones (slides mostly by Chad M Schafer) June 1, / 102
2 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression Nonparametric Procedures Cross Validation Local Polynomial Regression Confidence Bands Basis Methods: Splines Multiple Regression Notes based on All of Nonparametric Statistics by Larry Wasserman 2 / 102
3 Basic Concepts in Regression The Regression Problem: Objective is to construct a model that can be used to predict the response variable Y using observation of the predictor variables X Note: In this framework, Y is real-valued, while X could be a vector Example: Predicting the distance modulus from redshift Y = distance modulus X = redshift 3 / 102
4 Basic Concepts in Regression The Regression Problem: Observe a training sample (X 1, Y 1 ),, (X n, Y n ), use this to estimate f(x) = E(Y X = x) Equivalently: Estimate f(x) when Y i = f(x i ) + ɛ i, i = 1, 2,, n where E(ɛ i ) = 0 Simple estimators when X is real-valued: f(x) = β 0 + β 1 x parametric f(x) = mean{y i : X i x h} nonparametric 4 / 102
5 Basic Concepts of Regression Parametric Regression: Y = β 0 + β 1 X + ɛ Y = β 0 + β 1 X β d X d + ɛ Y = β 0 e β1x1 + β 2 X 2 + ɛ Nonparametric Regression: Y = f(x) + ɛ Y = f 1 (X 1 ) + + f d (X d ) + ɛ Y = f(x 1, X 2 ) + ɛ 5 / 102
6 Next We first look at a simple example from astronomy 5 / 102
7 Example: Type Ia Supernovae Distance Modulus Redshift From Riess, et al (2007), 182 Type Ia Supernovae 6 / 102
8 Example: Type Ia Supernovae Assumptions: Observed (redshift, distance modulus) pairs (z i, Y i ) are related by Y i = f(z i ) + σ i ɛ i, where the ɛ i are iid standard normal with mean zero Error standard deviations σ i assumed known Goal: Estimate f( ) 7 / 102
9 Example: Type Ia Supernovae ΛCDM, two parameter model: ( ) c(1 + z) z du f(z) = 5 log H 0 0 Ωm (1 + u) 3 + (1 Ω m ) H 0 = Hubble constant Ω m = dimensionless matter density 8 / 102
10 Example: Type Ia Supernovae Distance Modulus Redshift Estimate where H 0 = 7276 and Ω m = 0341 (the MLE) 9 / 102
11 Example: Type Ia Supernovae Distance Modulus Redshift Fourth order polynomial fit: Y = β 0 + β 1 z + β 2 z 2 + β 3 z 3 + β 4 z 4 + ɛ 10 / 102
12 Example: Type Ia Supernovae Distance Modulus Redshift Fifth order polynomial fit: Y = β 0 + β 1 z + β 2 z 2 + β 3 z 3 + β 4 z 4 + β 5 z 5 + ɛ 11 / 102
13 Next The previous two frames illustrate the fundamental challenge of constructing a regression model: How complex should it be? 11 / 102
14 The Bias Variance Tradeoff Must choose model to achieve balance between too simple high bias, ie, E( f(x)) not close to f(x) xxxxxx precise, but not accurate too complex high variance, ie, Var( f(x)) large xxxxxx accurate, but not precise This is the classic bias variance tradeoff Could be called the accuracy precision tradeoff 12 / 102
15 The Bias Variance Tradeoff An Example: Consider the Linear Regression Model These models are of the form p f(x) = β 0 + β i g i (x) where g i ( ) are fixed, specified functions (For example, g i (x) = x i ) i=1 Increasing p yields more complex models 13 / 102
16 Next Linear regression models are the classic approach to regression Here, the response is modelled as a linear combination of p selected predictors 13 / 102
17 Linear Regression Assume that p Y = β 0 + β i g i (x) + ɛ i=1 where g i ( ) are specified functions, not estimated Assume that E(ɛ) = 0, Var(ɛ) = σ 2 Often assume ɛ is Gaussian, but this is not essential Start with the Simple Linear Regression model: Y = β 0 + β 1 x + ɛ 14 / 102
18 Linear Regression Redshift g r LRG data from 2SLAQ, www2slaqinfo 15 / 102
19 Linear Regression Redshift Y i Y i Showing one of the fitted values, Ŷi g r 16 / 102
20 Linear Regression Redshift Y i ε i Y i Showing one of the residuals, ɛ i g r 17 / 102
21 Linear Regression The line fit here is with β 0 = 0493 and β 1 = Ŷ i = β 0 + β 1 x i These estimates of β 0 and β 1 minimize the sum of squared errors: SSE = This is least squares regression n ɛ 2 i = i=1 n i=1 ( ) 2 Y i Ŷi 18 / 102
22 Linear Regression Least squares has a natural geometric interpretation 19 / 102
23 Linear Regression The Design Matrix: D = 1 g 1 (X 1 ) g 2 (X 1 ) g p (X 1 ) 1 g 1 (X 2 ) g 2 (X 2 ) g p (X 2 ) 1 g 1 (X n ) g 2 (X n ) g p (X n ) Then, L = D(D T D) 1 D T is the projection matrix Note that ν = trace(l) = p + 1 = complexity of model 20 / 102
24 Linear Regression Let Y be a n-vector filled with Y 1, Y 2,, Y n The Fitted Values: Ŷ = LY = D β The Least Squares Estimates of β: β = (D T D) 1 D T Y The Residuals: ɛ = Y Ŷ = (I L)Y 21 / 102
25 Linear Regression in R Linear regression in R is done via the command lm() > holdlm = lm(y ~ x) Note: That is a tilde ( ) between y and x Default is to include intercept term If want to exclude it, use > holdlm = lm(y ~ x - 1) 22 / 102
26 Linear Regression in R y x 23 / 102
27 Linear Regression in R Fitting the model Y = β 0 + β 1 x + ɛ The summary() shows the key output > holdlm = lm(y ~ x) > summary(holdlm) < CUT SOME HERE > Coefficients: Estimate Std Error t value Pr(> t ) (Intercept) * x e-10 *** < CUT SOME HERE > Note that β 1 = 207, the SE for β 1 is approximately / 102
28 Linear Regression in R Coefficients: Estimate Std Error t value Pr(> t ) (Intercept) * x e-10 *** If the ɛ are Gaussian, then the last column gives the p-value for the test of H 0 : β i = 0 versus H 1 : β i 0 Also, if the ɛ are Gaussian, β i ± 2 ŜE( β i ) is an approximate 95% confidence interval for β i 25 / 102
29 Linear Regression in R Residuals Fitted Values Always a good idea to look at plot of residuals versus fitted values There should be no apparent pattern > plot(holdlm$residuals,holdlm$fittedvalues) 26 / 102
30 Linear Regression in R Sample Quantiles Theoretical Quantiles Is it reasonable to assume ɛ are Gaussian? Check the normal probability plot of the residuals > qqplot(holdlm$residuals) > qqline(holdlm$residuals) 27 / 102
31 Linear Regression in R Simple to include more terms in the model: > holdlm = lm(y ~ x + z) Unfortunately, this does not work: > holdlm = lm(y ~ x + x^2) Instead, can use: > holdlm = lm(y ~ x + I(x^2)) 28 / 102
32 The Correlation Coefficient A standard way of quantifying the strength of the linear relationship between two variables is via the (Pearson) correlation coefficient Sample Version: r = 1 n 1 n i=1 X i Y i where X i and Y i are the standardized versions of X i and Y i, ie, X i = X i X s X 29 / 102
33 The Correlation Coefficient Note that 1 r 1 r = 1 indicates perfect linear, increasing relationship r = 1 indicates perfect linear, decreasing relationship Also, note that In R: use cor() ) sy β 1 = r( s X 30 / 102
34 The Correlation Coefficient Examples of different scatter plots and values for r (Wikipedia) 31 / 102
35 The Coefficient of Determination A very popular, but overused, regression diagnostic is the coefficient of determination, denoted R 2 ( ) 2 n n ( R 2 i=1 Y i Ŷi = 1 ( Yi Y ) 2 = i=1 Yi Y ) 2 ( n i=1 Y i Ŷi n ( i=1 Yi Y ) 2 n i=1 ) 2 The proportion of the variation explained by the model Note that 0 R 2 1, and, R 2 = r 2 But: Large R 2 does not imply that the model is a good fit 32 / 102
36 Anscombe s Quartet (1973) 33 / 102
37 Next Now we will discuss nonparametric regression Reconsider the SNe example from the start of the lecture 33 / 102
38 Example: Type Ia Supernovae Distance Modulus Redshift Nonparametric estimate 34 / 102
39 The Bias Variance Tradeoff Nonparametric Case: Every nonparametric procedure requires a smoothing parameter h Consider the regression estimator based on local averaging: f(x) = mean{y i : X i x h} Increasing h yields smoother estimate f 35 / 102
40 What is nonparametric? Data Estimate Assumptions In the parametric case, the influence of assumptions is fixed 36 / 102
41 What is nonparametric? Data Estimate Assumptions h In the nonparametric case, the influence of assumptions is controlled by h 37 / 102
42 The Bias Variance Tradeoff Squared error loss: L ( f(x), f(x) ) = ( f(x) f(x) ) 2 Mean squared error (MSE): MSE = E ( L(f(x), f(x)) ) = bias 2 x + variance x where bias x = E( f(x)) f(x) variance x = Var( f(x)) 38 / 102
43 The Bias Variance Tradeoff Risk Bias squared Variance Less Optimal smoothing More 39 / 102
44 Bias Variance Tradeoff: choosing h Want to minimize the average MSE (risk): 1 n n E ( (f(x i ) f(x i )) 2) i=1 Estimate with the leave-one-out cross-validation score: CV = 1 n n (Y i f ( i) (x i )) 2 where f ( i) is the estimator obtained by omitting the i th pair (x i, Y i ) i=1 40 / 102
45 The Bias Variance Tradeoff: choosing h Risk Estimated via Cross Validation h 41 / 102
46 Bias Variance Tradeoff: what the theory says In certain settings, the optimal choice of h is ( ) 1 h = O n 1/5 and the corresponding minimum risk is Risk = O ( ) 1 n 4/5 However, for parametric problems, when the model is correct, ( ) 1 Risk = O n Message: models are helpful if we have a good reason to believe they are approximately correct 42 / 102
47 Next Now we will take a closer look at a class of some nonparametric regression procedures that share a common structure with linear regression Namely, they are all linear smoothers 42 / 102
48 Linear Smoothers For a linear smoother: f(x) = i l i (x)y i for some weights: l 1 (x),, l n (x) The vector of weights depends on the target point x For example, in linear regression, we saw Ŷ = LY, and therefore l 1 (X i ),, l n (X i ) is given by the i th row of the projection matrix L = D(D T D) 1 D T 43 / 102
49 Linear Smoothers If f(x) = n i=1 l i(x)y i then Ŷ 1 Ŷn }{{} Ŷ Ŷ = L Y f(x 1 ) f(xn) = l 1 (X 1 ) l 2 (X 1 ) ln(x 1 ) l 1 (X 2 ) l 2 (X 2 ) ln(x 2 ) l 1 (Xn) l 2 (Xn) ln(xn) } {{ } L Y 1 Yn } {{ } Y The effective degrees of freedom is: ν = trace(l) = n L ii Can be interpreted as the number of parameters in the model Recall that ν = p + 1 for linear regression i=1 44 / 102
50 Linear Smoothers Consider the Extreme Cases: When l i (X j ) = 1/n for all i, j, f(x j ) = Y for all j and ν = 1 When l i (X j ) = δ ij for all i, j, f(x j ) = Y j and ν = n 45 / 102
51 Next One of the simplest examples is binning 45 / 102
52 Binning Divide the x-axis into bins B 1, B 2, of width h f is a step function based on averaging the Y i s in each bin: { } for x B j : f(x) = mean Y i : X i B j = 1 n 1 {Xi B n j}y i j where n j is the number of observations in B j i=1 The (arbitrary) choice of the boundaries of the bins can affect inference, especially when h large 46 / 102
53 Binning WMAP data, binned estimate with h = 15 Cl l 47 / 102
54 Binning WMAP data, binned estimate with h = 50 Cl l 48 / 102
55 Binning WMAP data, binned estimate with h = 75 Cl l 49 / 102
56 Next A step beyond binning is to take local averages to estimate the regression function 49 / 102
57 Local Averaging For each x let N x = [ x h 2, x + h ] 2 This is a moving window of length h, centered at x Define } f(x) = mean {Y i : X i N x This is like binning but removes the arbitrary boundaries 50 / 102
58 Local Averaging WMAP data, local average estimate with h = 15 Cl Binned Local Averaging l 51 / 102
59 Local Averaging WMAP data, local average estimate with h = 50 Cl Binned Local Averaging l 52 / 102
60 Local Averaging WMAP data, local average estimate with h = 75 Cl Binned Local Averaging l 53 / 102
61 Next Local averaging is a special case of kernel regression 53 / 102
62 Kernel Regression The local average estimator can be written: n i=1 f(x) = Y i K ( x X i ) h n i=1 K ( x X i ) h where K(x) = { 1 x < 1/2 0 otherwise Can improve this by using a function K which is smoother 54 / 102
63 Kernels A kernel is any smooth function K such that K(x) 0 and K(x) dx = 1, xk(x)dx = 0 and σk 2 x 2 K(x)dx > 0 Some commonly used kernels are the following: the boxcar kernel : K(x) = 1 I( x < 1), 2 the Gaussian kernel : K(x) = 1 e x2 /2, 2π the Epanechnikov kernel : K(x) = 3 4 (1 x2 )I( x < 1) the tricube kernel : K(x) = (1 x 3 ) 3 I( x < 1) 55 / 102
64 Kernels / 102
65 Kernel Regression We can write this as where n i=1 f(x) = Y i K ( ) x X i h n i=1 K ( x X i ) h f(x) = i Y i l i (x) K ( x X i ) h l i (x) = n i=1 K ( x X i ) h 57 / 102
66 Kernel Regression WMAP data, kernel regression estimates, h = 15 Cl boxcar kernel Gaussian kernel tricube kernel l 58 / 102
67 Kernel Regression WMAP data, kernel regression estimates, h = 50 Cl boxcar kernel Gaussian kernel tricube kernel l 59 / 102
68 Kernel Regression WMAP data, kernel regression estimates, h = 75 Cl boxcar kernel Gaussian kernel tricube kernel l 60 / 102
69 Kernel Regression: what the theory says Suppose the x values are distributed according to the density g Then for estimating f(x), kernel regression has variance and bias where g(x) is the density for X σ 2 g(x)nh K 2 (u)du ( h f (x) + f ) (x)g (x) g(x) u 2 K(u)du Problem: the presence of the term 2f (x) g (x) g(x) = design bias Another problem is that there is particularly large bias near the boundaries 61 / 102
70 Kernel Regression WMAP data, h = 50 (zoomed in) Note the boundary bias Cl boxcar kernel Gaussian kernel tricube kernel l 62 / 102
71 Next The deficiencies of kernel regression can be addressed by considering models which, on small scales, model the regression function as a low order polynomial 62 / 102
72 Local Polynomial Regression Recall polynomial regression: f(x) = β 0 + β 1 x + β 2 x β p x p where β = ( β 0,, β p ) are obtained by least squares: minimize ( n Y i [β 0 + β 1 x + β 2 x β p x p ] i=1 ) 2 63 / 102
73 Local Polynomial Regression Local polynomial regression: approximate f(x) locally by a different polynomial for every x: f(u) β 0 (x) + β 1 (x)(u x) + β 2 (x)(u x) β p (x)(u x) p for u near x Estimate ( β 0 (x),, β p (x)) by local least squares: minimize n ( ) x (Y i [β 0 (x) + β 1 (x)(x i x) + + β p (x)(x i x) p ]) 2 Xi K h }{{} kernel i=1 f(x) = β 0 (x) 64 / 102
74 Local Polynomial Regression The local polynomial regression estimate is f(x) = where l(x) T = (l 1 (x),, l n (x)), n l i (x)y i = β 0 (x) i=1 l(x) T = e T 1 (D T x W x D x ) 1 D T x W x, e 1 = (1, 0,, 0) T and D x and W x are defined by Dx = 1 X 1 x (X 1 x) p 1 X 2 x (X 2 x) p 1 Xn x (Xn x) p Wx = K ( x X1 h ) K ( x Xn h ) Compare to standard least squares which gives l(x i ) T as the i th row of L = D(D T D) 1 D T 65 / 102
75 Local Polynomial Regression So for Ŷ = (Ŷ1,, Ŷn) and Ŷi = f(x i ), we have where L is the smoothing matrix: L = The effective degrees of freedom is: Ŷ = LY l 1 (X 1 ) l 2 (X 1 ) ln(x 1 ) l 1 (X 2 ) l 2 (X 2 ) ln(x 2 ) l 1 (Xn) l 2 (Xn) ln(xn) ν = trace(l) = n L ii i=1 66 / 102
76 Local Polynomial Regression Taking p = 0, f(x) minimizes n ( ) x (Y i β 0 (x)) 2 Xi K h i=1 which exactly gives the kernel regression estimator n f(x) = l i (x)y i i=1 K ( x X i ) h l i (x) = ( ) n j=1 K x Xj h Taking p = 1 yields the local linear estimator, with f(x) minimizing n ( ) x (Y i [β 0 (x) + β 1 (x)(x i x)]) 2 Xi K h i=1 This is a very good all-purpose smoother Choice of Kernel K: not important Choice of bandwidth h: crucial 67 / 102
77 Local Polynomial Regression: what the theory says Local linear (p = 1) is better than kernel (p = 0): 1 Both have similar variance but the kernel estimator has bias ( h f (x) + f ) (x)g (x) u 2 K(u)du g(x) whereas the local linear estimator has bias h f (x) u 2 K(u)du Thus, the local linear estimator is free from design bias 2 At the boundary points, the kernel estimator has bias of O(h) while the local linear estimator has bias O(h 2 ) 68 / 102
78 Local Polynomial Regression: choosing h (and p) Cross-validation revisited: Minimize leave-one-out cross-validation score: Amazing shortcut formula: CV = 1 n CV = 1 n n (Y i f ( i) (x i )) 2 i=1 ( n Y i f(x ) 2 i ) 1 L ii i=1 A commonly used approximation is GCV (generalized cross-validation): 1 n GCV = n ( ) 1 ν 2 (Y i f(x i )) 2 n i=1 ν = trace(l) 69 / 102
79 Next Next, we will go over a bit of how to perform local polynomial regression in R 69 / 102
80 Using locfit() Need to include the locfit library: > installpackages("locfit") > library(locfit) > result = locfit(y x, alpha=c(0, 15), deg=1) y and x are vectors the second argument to alpha gives the bandwidth (h) the first argument to alpha specifies the nearest neighbor fraction, an alternative to the bandwidth fitted(result) gives the fitted values, f(x i ) residuals(result) gives the residuals, Y i f(x i ) 70 / 102
81 Using locfit() x y x(1 x)sin((21π) (x + 02)) estimate using h = 0005 estimate using gcv optimal h = 0015 degree=1, n=1000, sigma=01 71 / 102
82 Using locfit() x y x(1 x)sin((21π) (x + 02)) estimate using h = 01 estimate using gcv optimal h = 0015 degree=1, n=1000, sigma=01 72 / 102
83 Using locfit() x y (x + 5exp( 4x 2 )) estimate using h = 005 estimate using gcv optimal h = 0174 degree=1, n=1000, sigma=05 73 / 102
84 Using locfit() x y (x + 5exp( 4x 2 )) estimate using h = 05 estimate using gcv optimal h = 0174 degree=1, n=1000, sigma=05 74 / 102
85 Example WMAP data, local linear fit, h = x y estimate using h = 15 estimate using gcv optimal h = / 102
86 Example WMAP data, local linear fit, h = x y estimate using h = 125 estimate using gcv optimal h = / 102
87 Next We should quantify the amount of error in our estimator for the regression function Next we will look at how this can be done using local polynomial regression 76 / 102
88 Variance Estimation Let n σ 2 i=1 = (Y i f(x i )) 2 n 2ν + ν where n ν = tr(l), ν = tr(l T L) = l(x i ) 2 If f is sufficiently smooth, then σ 2 is a consistent estimator of σ 2 i=1 77 / 102
89 Variance Estimation For the WMAP data, using local linear fit > wmap = readtable("wmapdat",header=t) > opth = 630 > locfitwmap = locfit(wmap$cl[1:700]~wmap$ell[1:700], alpha=c(0,opth),deg=1) > nu = asnumeric(locfitwmap$dp[6]) > nutilde = asnumeric(locfitwmap$dp[7]) > sigmasqrhat = sum(residuals(locfitwmap)^2)/(700-2*nu+nutilde) > sigmasqrhat [1] But, does not seem reasonable to assume homoscedasticity 78 / 102
90 Variance Estimation Allow σ to be a function of x: Y i = f(x i ) + σ(x i )ɛ i Let Z i = log(y i f(x i )) 2 and δ i = log ɛ 2 i Then, Z i = log ( σ 2 (x i ) ) + δ i This suggests estimating log σ 2 (x) by regressing the log squared residuals on x 79 / 102
91 Variance Estimation 1 Estimate f(x) with any nonparametric method to get an estimate f(x) 2 Define Z i = log(y i f(x i )) 2 3 Regress the Z i s on the x i s (again using any nonparametric method) to get an estimate q(x) of log σ 2 (x) and let σ 2 (x) = e q(x) 80 / 102
92 Example WMAP data, log squared residuals, local linear fit, h = 630 log residual squared l 81 / 102
93 Example With local linear fit, h = 130, chosen via GCV log residual squared l 82 / 102
94 Example Estimating σ(x): sigma hat l 83 / 102
95 Confidence Bands Recall that f(x) = i Y i l i (x) so Var( f(x)) = i σ 2 (X i )l 2 i (x) An approximate 1 α confidence interval for f(x) is f(x) ± z α/2 i σ 2 (X i )l 2 i (x) When σ(x) is smooth, we can approximate σ 2 (X i )l 2 i (x) σ(x) l(x) i 84 / 102
96 Confidence Bands Two caveats: 1 f is biased so this is really an interval for E( f(x)) Result: bands can miss sharp peaks in the function 2 Pointwise coverage does not imply simultaneous coverage for all x Solution for 2 is to replace z α/2 with a larger number (Sun and Loader 1994) locfit() does this for you 85 / 102
97 More on locfit() > normell = predictlocfit(locfitwmap,where="data",what="vari") normell will be l(x i ), i = 1, 2,, n (needed for variance calculation) The Sun and Loader replacement for z α/2 is found using kappa0(locfitwmap)$critval 86 / 102
98 Example 95% pointwise confidence bands: Cl estimate 95% pointwise band assuming constant variance 95% pointwise band using nonconstant variance l 87 / 102
99 Example 95% simultaneous confidence bands: Cl estimate 95% simultaneous band assuming constant variance 95% simultaneous band using nonconstant variance l 88 / 102
100 Next We discuss the use of regression splines These are a well-motivated choice of basis for constructing regression functions 88 / 102
101 Basis Methods Idea: expand f as f(x) = j β j ψ j (x) where ψ 1 (x), ψ 2 (x), are specially chosen, known functions Then estimate β j and set f(x) = β j ψ j (x) j Here we consider a popular version, splines 89 / 102
102 Splines and Penalization Define f to be the function that minimizes M(λ) = (Y i f(x i )) 2 + λ i ( f (x)) 2 dx λ = 0 = f(x i ) = Y i (no smoothing) λ = = f(x) = β 0 + β 1 x (linear) 0 < λ < = f(x) = cubic spline with knots at X i A cubic spline is a continuous function f such that 1 f is a cubic polynomial between the X i s 2 f has continuous first and second derivatives at the X i s 90 / 102
103 Basis For Splines Define (z) + = max{z, 0}, N = n + 4, ψ 1(x) = 1 ψ 2(x) = x ψ 3(x) = x 2 ψ 4(x) = x 3 ψ 5(x) = (x X 1) 3 + ψ 6(x) = (x X 2) 3 + ψ N (x) = (x X n) 3 + These functions form a basis for the splines: we can write f(x) = N β j ψ j (x) j=1 (For numerical calculations it is actually more efficient to use other spline bases) We can thus write N f(x) = β j ψ j (x), (1) j=1 91 / 102
104 Basis For Splines We can now rewrite the minimization as follows: minimize : (Y Ψβ) T (Y Ψβ) + λβ T Ωβ where Ψ ij = ψ j (X i ) and Ω jk = ψ j (x)ψ k (x)dx The value of β that minimizes this is β = (Ψ T Ψ + λω) 1 Ψ T Y The smoothing spline f(x) is a linear smoother, that is, there exist weights l(x) such that f(x) = n i=1 Y il i (x) 92 / 102
105 Basis For Splines Basis functions with 5 knots Psi x 93 / 102
106 Basis For Splines Same span as previous frame, the B-spline basis, 5 knots: x 94 / 102
107 Smoothing Splines in R > smosplresult = smoothspline(x,y, cv=false, allknots=true) If cv=true, then cross-validiation used to choose λ; if cv=false, gcv is used If allknots=true, then knots are placed at all data points; if allknots=false, then set nknots to specify the number of knots to be used Using fewer knots eases the computational cost predict(smosplresult) gives fitted values 95 / 102
108 Example WMAP data, λ chosen using GCV Cl l 96 / 102
109 Next Finally, we consider the challenges of fitting regression models when the predictor x is of large dimension 96 / 102
110 Multiple Regression Y = f(x 1, X 2,, X d ) + ɛ The curse of dimensionality: Optimal rate of convergence for d = 1 is n 4/5 In d dimensions the optimal rate of convergence is is n 4/(4+d) Thus, the sample size m required for a d-dimensional problem to have the same accuracy as a sample size n in a one-dimensional problem is m n cd where c = (4 + d)/(5d) > 0 To maintain a given degree of accuracy of an estimator, the sample size must increase exponentially with the dimension d Put another way, confidence bands get very large as the dimension d increases 97 / 102
111 Multiple Local Linear Regression There is now a d d bandwidth matrix H (nonsingular positive definite) and we use K(H 1/2 x) Often, we instead scale each covariate to have the same mean and variance and then we use K( x /h) where K is any one-dimensional kernel Then there is a single bandwidth parameter h 98 / 102
112 Multiple Local Linear Regression At a target value x = (x 1,, x d ) T, the local sum of squares is given by ( n w i (x) Y i β 0 i=1 j=1 d β j (X ij x j ) ) 2 where w i (x) = K( X i x /h) The estimator is f(x) = β 0, where β = ( β 0,, β d ) T is the value of β = (β 0,, β d ) T that minimizes the weighted sums of squares 99 / 102
113 Multiple Local Linear Regression The solution β is β = (D T x W x D x ) 1 D T x W x Y where 1 (X 11 x 1 ) (X 1d x d ) 1 (X 21 x 1 ) (X 2d x d ) D x = 1 (X n1 x 1 ) (X nd x d ) and W x is the diagonal matrix whose (i, i) element is w i (x) 100 / 102
114 A Compromise: Additive Models Y = α + d f j (x j ) + ɛ j=1 Usually take α = Y Then estimate the f j s by backfitting 1 set α = Y, f 1 = f d = 0 2 Iterate until convergence: for j = 1,, d: Compute Ỹi = Yi α k j f k (X ik ), i = 1,, n Apply a smoother to Ỹ on (X1j,, Xnj)T to obtain f j Set f j(x) equal to f j(x) n 1 n i=1 f j(x ij) The last step ensures that n i=1 f j (X ij ) = 0 (identifiability) 101 / 102
115 To Sum Up Classic linear regression is computationally and theoretically simple Nonparametric procedures for regression offer increased flexibility Local polynomial regression improves on kernel regression Bandwidth can be chosen using cross validation High-dimensional data present a challenge Not covered, Bayesian methods 102 / 102
Introduction to Regression
Introduction to Regression p. 1/97 Introduction to Regression Chad Schafer cschafer@stat.cmu.edu Carnegie Mellon University Introduction to Regression p. 1/97 Acknowledgement Larry Wasserman, All of Nonparametric
More informationIntroduction to Regression
Introduction to Regression Chad M. Schafer May 20, 2015 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression Nonparametric Procedures Cross Validation Local Polynomial Regression
More informationIntroduction to Regression
Introduction to Regression Chad M. Schafer cschafer@stat.cmu.edu Carnegie Mellon University Introduction to Regression p. 1/100 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression
More informationNonparametric Regression. Badr Missaoui
Badr Missaoui Outline Kernel and local polynomial regression. Penalized regression. We are given n pairs of observations (X 1, Y 1 ),...,(X n, Y n ) where Y i = r(x i ) + ε i, i = 1,..., n and r(x) = E(Y
More informationIntroduction to Regression
PSU Summer School in Statistics Regression Slide 1 Introduction to Regression Chad M. Schafer Department of Statistics & Data Science, Carnegie Mellon University cschafer@cmu.edu June 2018 PSU Summer School
More informationNonparametric Estimation of Luminosity Functions
x x Nonparametric Estimation of Luminosity Functions Chad Schafer Department of Statistics, Carnegie Mellon University cschafer@stat.cmu.edu 1 Luminosity Functions The luminosity function gives the number
More informationLocal regression I. Patrick Breheny. November 1. Kernel weighted averages Local linear regression
Local regression I Patrick Breheny November 1 Patrick Breheny STA 621: Nonparametric Statistics 1/27 Simple local models Kernel weighted averages The Nadaraya-Watson estimator Expected loss and prediction
More informationNonparametric Inference in Cosmology and Astrophysics: Biases and Variants
Nonparametric Inference in Cosmology and Astrophysics: Biases and Variants Christopher R. Genovese Department of Statistics Carnegie Mellon University http://www.stat.cmu.edu/ ~ genovese/ Collaborators:
More informationNonparametric Methods
Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis
More information41903: Introduction to Nonparametrics
41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific
More informationStatistics 203: Introduction to Regression and Analysis of Variance Penalized models
Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance
More informationCan we do statistical inference in a non-asymptotic way? 1
Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.
More informationTime Series and Forecasting Lecture 4 NonLinear Time Series
Time Series and Forecasting Lecture 4 NonLinear Time Series Bruce E. Hansen Summer School in Economics and Econometrics University of Crete July 23-27, 2012 Bruce Hansen (University of Wisconsin) Foundations
More informationNonparametric Regression 10/ Larry Wasserman
Nonparametric Regression 10/36-702 Larry Wasserman 1 Introduction Now we focus on the following problem: Given a sample (X 1, Y 1 ),..., (X n, Y n ), where X i R d and Y i R, estimate the regression function
More informationInference for Regression
Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationLecture Notes 15 Prediction Chapters 13, 22, 20.4.
Lecture Notes 15 Prediction Chapters 13, 22, 20.4. 1 Introduction Prediction is covered in detail in 36-707, 36-701, 36-715, 10/36-702. Here, we will just give an introduction. We observe training data
More informationIntroduction to Nonparametric Regression
Introduction to Nonparametric Regression Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 04-Jan-2017 Nathaniel E. Helwig (U of Minnesota)
More informationCh. 5 Transformations and Weighting
Outline Three approaches: Ch. 5 Transformations and Weighting. Variance stabilizing transformations; Box-Cox Transformations - Section 5.2; 5.4 2. Transformations to linearize the model - Section 5.3 3.
More informationNonparametric Statistics
Nonparametric Statistics Jessi Cisewski Yale University Astrostatistics Summer School - XI Wednesday, June 3, 2015 1 Overview Many of the standard statistical inference procedures are based on assumptions
More informationEcon 582 Nonparametric Regression
Econ 582 Nonparametric Regression Eric Zivot May 28, 2013 Nonparametric Regression Sofarwehaveonlyconsideredlinearregressionmodels = x 0 β + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β The assume
More information1. Variance stabilizing transformations; Box-Cox Transformations - Section. 2. Transformations to linearize the model - Section 5.
Ch. 5: Transformations and Weighting 1. Variance stabilizing transformations; Box-Cox Transformations - Section 5.2; 5.4 2. Transformations to linearize the model - Section 5.3 3. Weighted regression -
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationA Modern Look at Classical Multivariate Techniques
A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico
More informationNonparametric Regression. Changliang Zou
Nonparametric Regression Institute of Statistics, Nankai University Email: nk.chlzou@gmail.com Smoothing parameter selection An overall measure of how well m h (x) performs in estimating m(x) over x (0,
More informationChapter 9. Non-Parametric Density Function Estimation
9-1 Density Estimation Version 1.1 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least
More informationSimple Linear Regression
Simple Linear Regression September 24, 2008 Reading HH 8, GIll 4 Simple Linear Regression p.1/20 Problem Data: Observe pairs (Y i,x i ),i = 1,...n Response or dependent variable Y Predictor or independent
More informationLecture 24: Weighted and Generalized Least Squares
Lecture 24: Weighted and Generalized Least Squares 1 Weighted Least Squares When we use ordinary least squares to estimate linear regression, we minimize the mean squared error: MSE(b) = 1 n (Y i X i β)
More informationGeneralized Additive Models
Generalized Additive Models The Model The GLM is: g( µ) = ß 0 + ß 1 x 1 + ß 2 x 2 +... + ß k x k The generalization to the GAM is: g(µ) = ß 0 + f 1 (x 1 ) + f 2 (x 2 ) +... + f k (x k ) where the functions
More information12 - Nonparametric Density Estimation
ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6
More informationLinear Regression Model. Badr Missaoui
Linear Regression Model Badr Missaoui Introduction What is this course about? It is a course on applied statistics. It comprises 2 hours lectures each week and 1 hour lab sessions/tutorials. We will focus
More informationCOMP 551 Applied Machine Learning Lecture 20: Gaussian processes
COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55
More informationChapter 9. Non-Parametric Density Function Estimation
9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least
More informationData Mining Stat 588
Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic
More informationRemedial Measures for Multiple Linear Regression Models
Remedial Measures for Multiple Linear Regression Models Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Remedial Measures for Multiple Linear Regression Models 1 / 25 Outline
More informationSparse Nonparametric Density Estimation in High Dimensions Using the Rodeo
Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University
More informationKernel density estimation
Kernel density estimation Patrick Breheny October 18 Patrick Breheny STA 621: Nonparametric Statistics 1/34 Introduction Kernel Density Estimation We ve looked at one method for estimating density: histograms
More informationLocal Polynomial Regression
VI Local Polynomial Regression (1) Global polynomial regression We observe random pairs (X 1, Y 1 ),, (X n, Y n ) where (X 1, Y 1 ),, (X n, Y n ) iid (X, Y ). We want to estimate m(x) = E(Y X = x) based
More informationLecture 14 Simple Linear Regression
Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent
More informationSliced Inverse Regression
Sliced Inverse Regression Ge Zhao gzz13@psu.edu Department of Statistics The Pennsylvania State University Outline Background of Sliced Inverse Regression (SIR) Dimension Reduction Definition of SIR Inversed
More informationIntroduction. Linear Regression. coefficient estimates for the wage equation: E(Y X) = X 1 β X d β d = X β
Introduction - Introduction -2 Introduction Linear Regression E(Y X) = X β +...+X d β d = X β Example: Wage equation Y = log wages, X = schooling (measured in years), labor market experience (measured
More informationStatistics 203: Introduction to Regression and Analysis of Variance Course review
Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying
More informationQuantitative Economics for the Evaluation of the European Policy. Dipartimento di Economia e Management
Quantitative Economics for the Evaluation of the European Policy Dipartimento di Economia e Management Irene Brunetti 1 Davide Fiaschi 2 Angela Parenti 3 9 ottobre 2015 1 ireneb@ec.unipi.it. 2 davide.fiaschi@unipi.it.
More informationIntroduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones
Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive
More informationGeneralized Linear Models
Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n
More informationUnit 10: Simple Linear Regression and Correlation
Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationIEOR 165 Lecture 7 1 Bias-Variance Tradeoff
IEOR 165 Lecture 7 Bias-Variance Tradeoff 1 Bias-Variance Tradeoff Consider the case of parametric regression with β R, and suppose we would like to analyze the error of the estimate ˆβ in comparison to
More informationSTATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002
Time allowed: 3 HOURS. STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 This is an open book exam: all course notes and the text are allowed, and you are expected to use your own calculator.
More informationStatistics - Lecture Three. Linear Models. Charlotte Wickham 1.
Statistics - Lecture Three Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Linear Models 1. The Theory 2. Practical Use 3. How to do it in R 4. An example 5. Extensions
More informationFinal Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58
Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple
More informationMultiple Linear Regression
Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from
More informationMatrix Approach to Simple Linear Regression: An Overview
Matrix Approach to Simple Linear Regression: An Overview Aspects of matrices that you should know: Definition of a matrix Addition/subtraction/multiplication of matrices Symmetric/diagonal/identity matrix
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:
More informationSimple Linear Regression
Simple Linear Regression Reading: Hoff Chapter 9 November 4, 2009 Problem Data: Observe pairs (Y i,x i ),i = 1,... n Response or dependent variable Y Predictor or independent variable X GOALS: Exploring
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More informationRecap. HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis:
1 / 23 Recap HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis: Pr(G = k X) Pr(X G = k)pr(g = k) Theory: LDA more
More informationModelling Non-linear and Non-stationary Time Series
Modelling Non-linear and Non-stationary Time Series Chapter 2: Non-parametric methods Henrik Madsen Advanced Time Series Analysis September 206 Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September
More informationNeed for Several Predictor Variables
Multiple regression One of the most widely used tools in statistical analysis Matrix expressions for multiple regression are the same as for simple linear regression Need for Several Predictor Variables
More informationSpatial Process Estimates as Smoothers: A Review
Spatial Process Estimates as Smoothers: A Review Soutir Bandyopadhyay 1 Basic Model The observational model considered here has the form Y i = f(x i ) + ɛ i, for 1 i n. (1.1) where Y i is the observed
More informationDensity estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas
0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity
More informationK. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =
K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing
More information13 Simple Linear Regression
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity
More informationLecture 10 Multiple Linear Regression
Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More informationOutline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model
Outline 1 Multiple Linear Regression (Estimation, Inference, Diagnostics and Remedial Measures) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression
More informationBusiness Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'
Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where
More informationIntroduction to Simple Linear Regression
Introduction to Simple Linear Regression Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Introduction to Simple Linear Regression 1 / 68 About me Faculty in the Department
More informationRegression #3: Properties of OLS Estimator
Regression #3: Properties of OLS Estimator Econ 671 Purdue University Justin L. Tobias (Purdue) Regression #3 1 / 20 Introduction In this lecture, we establish some desirable properties associated with
More informationPenalized Splines, Mixed Models, and Recent Large-Sample Results
Penalized Splines, Mixed Models, and Recent Large-Sample Results David Ruppert Operations Research & Information Engineering, Cornell University Feb 4, 2011 Collaborators Matt Wand, University of Wollongong
More informationModel Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao
Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics Jiti Gao Department of Statistics School of Mathematics and Statistics The University of Western Australia Crawley
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationLecture 3: Statistical Decision Theory (Part II)
Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical
More informationNonparametric Econometrics
Applied Microeconometrics with Stata Nonparametric Econometrics Spring Term 2011 1 / 37 Contents Introduction The histogram estimator The kernel density estimator Nonparametric regression estimators Semi-
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationRegression diagnostics
Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model
More informationSpatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood
Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Kuangyu Wen & Ximing Wu Texas A&M University Info-Metrics Institute Conference: Recent Innovations in Info-Metrics October
More informationLecture 18: Simple Linear Regression
Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength
More informationPart 8: GLMs and Hierarchical LMs and GLMs
Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course
More informationLinear models and their mathematical foundations: Simple linear regression
Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction
More informationNumerical interpolation, extrapolation and fi tting of data
Chapter 6 Numerical interpolation, extrapolation and fi tting of data 6.1 Introduction Numerical interpolation and extrapolation is perhaps one of the most used tools in numerical applications to physics.
More informationSimple Linear Regression
Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)
More informationModel-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego
Model-free prediction intervals for regression and autoregression Dimitris N. Politis University of California, San Diego To explain or to predict? Models are indispensable for exploring/utilizing relationships
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More informationLinear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,
Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,
More informationSTAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)
STAT40 Midterm Exam University of Illinois Urbana-Champaign October 19 (Friday), 018 3:00 4:15p SOLUTIONS (Yellow) Question 1 (15 points) (10 points) 3 (50 points) extra ( points) Total (77 points) Points
More informationSTAT5044: Regression and Anova. Inyoung Kim
STAT5044: Regression and Anova Inyoung Kim 2 / 47 Outline 1 Regression 2 Simple Linear regression 3 Basic concepts in regression 4 How to estimate unknown parameters 5 Properties of Least Squares Estimators:
More informationLinear Methods for Prediction
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 6: Bias and variance (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 49 Our plan today We saw in last lecture that model scoring methods seem to be trading off two different
More informationInteraction effects for continuous predictors in regression modeling
Interaction effects for continuous predictors in regression modeling Testing for interactions The linear regression model is undoubtedly the most commonly-used statistical model, and has the advantage
More informationSection 7: Local linear regression (loess) and regression discontinuity designs
Section 7: Local linear regression (loess) and regression discontinuity designs Yotam Shem-Tov Fall 2015 Yotam Shem-Tov STAT 239/ PS 236A October 26, 2015 1 / 57 Motivation We will focus on local linear
More informationDay 4A Nonparametrics
Day 4A Nonparametrics A. Colin Cameron Univ. of Calif. - Davis... for Center of Labor Economics Norwegian School of Economics Advanced Microeconometrics Aug 28 - Sep 2, 2017. Colin Cameron Univ. of Calif.
More informationEstimation of cumulative distribution function with spline functions
INTERNATIONAL JOURNAL OF ECONOMICS AND STATISTICS Volume 5, 017 Estimation of cumulative distribution function with functions Akhlitdin Nizamitdinov, Aladdin Shamilov Abstract The estimation of the cumulative
More informationAdditive Isotonic Regression
Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive
More informationInstance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest
More informationPart 6: Multivariate Normal and Linear Models
Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of
More informationBiostatistics Advanced Methods in Biostatistics IV
Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION
COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:
More information