Modern Regression Basics

Size: px
Start display at page:

Download "Modern Regression Basics"

Transcription

1 Modern Regression Basics T. W. Yee University of Auckland October Cagliari t.yee@auckland.ac.nz T. W. Yee (University of Auckland) Modern Regression Basics 1/167 October Cagliari 1 / 167

2 Outline of This Talk Outline of This Talk 1 Linear Models 2 Generalized Linear Models (GLMs) 3 Smoothing 4 Generalized Additive Models (GAMs) 5 Introduction to VGLMs and VGAMs 6 Concluding Remarks T. W. Yee (University of Auckland) Modern Regression Basics 2/167 October Cagliari 2 / 167

3 Linear Models Linear Models Data (x i, y i, w i ), i = 1,..., n, Var(ε i ) = σ 2 /w i, and E(Y i ) = η(x i ) = p x ik β k. k=1 That is, y = X β + ε, ε N p ( 0, σ 2 W 1). (1) X is an n p matrix (assumed of rank p), and β is a p-vector of regression coefficients (parameters). t-test, ANOVA, multiple linear regression etc. are special cases of (1). T. W. Yee (University of Auckland) Modern Regression Basics 3/167 October Cagliari 3 / 167

4 Linear Models Estimation I Estimate β by weighted least squares (WLS): ) 2 n p ˆβ = argmin w i (y i x ik β k i=1 k=1 = argmin (y X β) T W (y X β). Solution is (from the normal equations) ˆβ = ( X T W X ) 1 X T Wy, (2) ŷ = X ˆβ. (3) Also, the variance-covariance matrix of β is Var(ˆβ) = σ 2 ( X T W X ) 1. (4) T. W. Yee (University of Auckland) Modern Regression Basics 5/167 October Cagliari 5 / 167

5 Estimation II Linear Models Suppose W = I n. Then LS has a very nice geometric interpretation. Also, ŷ = H y where H = X ( X T X ) 1 X T. (5) Note that H = H 2 (idempotent) and H = H T (symmetric), hence H is a projection matrix. Such a matrix represents an orthogonal projection. The eigenvalues of H are p 1 s and (n p) 0 s. Consequently, trace(h) = rank(h). T. W. Yee (University of Auckland) Modern Regression Basics 6/167 October Cagliari 6 / 167

6 Linear Models Figure: T. W. Yee (University of Auckland) Modern Regression Basics 8/167 October Cagliari 8 / 167

7 S Model Formulae I Linear Models The S model formula adopted from Wilkinson and Rogers (1973). Form: response expression LHS = the response (usually a vector in a data frame or a matrix). RHS = explanatory variables. T. W. Yee (University of Auckland) Modern Regression Basics 10/167 October Cagliari 10 / 167

8 Linear Models S Model Formulae II Consider > y ~ x1 + x2 + x3 + f1:f2 + f1 * x1 + f2/f3 + f3:f4:f5 + + (f6 + f7)^2 where variables beginning with an x are numeric and those beginning with an f are factors. By default an intercept is fitted, which is 1. Suppress intercepts by -1. The interaction f1*f2 is expanded to 1 + f1 + f2 + f1:f2. The terms f1 and f2 are main effects. A second-order interaction between two factors can be expressed using factor:factor: γ ij. There are other types of interactions. Interactions between a factor and numeric, factor:numeric, produce β j x. T. W. Yee (University of Auckland) Modern Regression Basics 11/167 October Cagliari 11 / 167

9 Linear Models S Model Formulae III Interactions between two numerics, numeric:numeric, produce a cross-product term such as βx 2 x 3. The term (f6 + f7)^2 expands to f6 + f7 + f6:f7. A term (f6 + f7 + f8)^2 - f7:f8 would expand to all main effects and all second-order interactions except for f7:f8. Nesting is achieved by /, e.g., f2/f3 is shorthand for 1 + f2 + f3:f2, or equivalently, > 1 + f2 + f3 %in% f2 Example: f2 = state and f3 = county. T. W. Yee (University of Auckland) Modern Regression Basics 12/167 October Cagliari 12 / 167

10 Linear Models S Model Formulae IV There are times when you need to use the identity function I(), e.g., because ^ has special meaning, > lm(y ~ -1 + offset(a) + x1 + I(x2-1) + I(x3^3)) fits y i = a i + β 1 x i1 + β 2 (x i2 1) + β 3 x 3 i3 + ε i, ε i iid N(0, σ 2 ), i = 1,..., n, where a is a vector containing the (known) a i. Other functions: factor(), as.factor(), ordered() terms(), levels(), options(). T. W. Yee (University of Auckland) Modern Regression Basics 13/167 October Cagliari 13 / 167

11 Linear Models S generics Generic functions are available for lm objects. They include add1() anova() coef() deviance() drop1() plot() predict() print() residuals() step() summary() update(). Other less used generic functions are alias(), effects(), family(), kappa(), labels(), proj(). Some other functions are model.matrix(), options(). T. W. Yee (University of Auckland) Modern Regression Basics 14/167 October Cagliari 14 / 167

12 Linear Models The lm() Function > args(lm) function (formula, data, subset, weights, na.action, method = "qr", model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE, contrasts = NULL, offset,...) NULL The most useful arguments are weights, subset, na.action na.fail(), na.omit(), contrasts. Data frames: read.table(), write.table(), na.omit(). T. W. Yee (University of Auckland) Modern Regression Basics 16/167 October Cagliari 16 / 167

13 Factors I Linear Models > options()$contrasts unordered "contr.treatment" ordered "contr.poly" Note: 1 contr.treatment is used so that each coefficient compares that level with level 1 (omitting level 1 itself). 2 contr.sum constrains the coefficients to sum to zero. 3 contr.poly is used for equally spaced, equally replicated orthogonal polynomial contrasts T. W. Yee (University of Auckland) Modern Regression Basics 18/167 October Cagliari 18 / 167

14 Linear Models Factors II One can change them by, for example, > options(contrasts = c("contr.treatment", "contr.poly")) Here, the first level of the factor is the baseline level. Table: Dummy variables partial method. RACE D 1 D 2 D 3 White Black Hispanic Other T. W. Yee (University of Auckland) Modern Regression Basics 19/167 October Cagliari 19 / 167

15 Factors III An example: Linear Models > options(contrasts = c("contr.treatment", "contr.poly")) > y <- 1:9 > x <- rep(1:3, len = 9) > lm(y ~ as.factor(x)) Call: lm(formula = y ~ as.factor(x)) Coefficients: (Intercept) as.factor(x)2 as.factor(x) T. W. Yee (University of Auckland) Modern Regression Basics 20/167 October Cagliari 20 / 167

16 Factors IV Another example: Linear Models > options(contrasts = c("contr.sum", "contr.poly")) > lm(y ~ as.factor(x)) Call: lm(formula = y ~ as.factor(x)) Coefficients: (Intercept) as.factor(x)1 as.factor(x) e e e-17 T. W. Yee (University of Auckland) Modern Regression Basics 21/167 October Cagliari 21 / 167

17 Linear Models Topics not done... Other important topics not done: residual analysis influential observations robust regression variable selection... T. W. Yee (University of Auckland) Modern Regression Basics 22/167 October Cagliari 22 / 167

18 Generalized Linear Models (GLMs) Generalized Linear Models (GLMs) Y Exponential family (normal, binomial, Poisson,... ) g(µ) = η(x) = β T x = β 1 + β 2 x β p x p g is the link function (known, monotonic, twice differentiable). p η = β k x k is known as the linear predictor. k=1 Proposed by Nelder and Wedderburn (1972), GLMs include the general linear model, logistic regression, probit analysis, Poisson regression, gamma, inverse Gaussian etc. The unification was a major breakthrough in statistical theory. Estimation: iteratively reweighted least squares (IRLS; see later) T. W. Yee (University of Auckland) Modern Regression Basics 23/167 October Cagliari 23 / 167

19 Generalized Linear Models (GLMs) The Exponential Family I The distribution of a univariate r.v. Y belongs to the exponential family if its p.(d).f. f (y; θ) can be written as f (y; θ) = exp{p(y)q(θ) + r(y) + s(θ)}. (6) Here θ = parameter of interest, and the functions p, q, r, s are known. Other parameters can be accomodated provided they are known we simply incorporate them in p, q, r and s. The exponential family has a canonical form where p(y) = y. We also want to be able to explicitly consider scale parameters such as σ in N(µ, σ 2 ) so we write { } y d(θ) b(θ) f (y; θ, φ) = exp ω + c(y, φ, ω). (7) φ T. W. Yee (University of Auckland) Modern Regression Basics 25/167 October Cagliari 25 / 167

20 Generalized Linear Models (GLMs) The Exponential Family II Equation (7) belongs to (6) provided φ is known (ω is some known constant here). (φ > 0, ω 0). θ = d(θ) is often called the natural parameter of the distribution. T. W. Yee (University of Auckland) Modern Regression Basics 26/167 October Cagliari 26 / 167

21 Generalized Linear Models (GLMs) The Exponential Family III (i) Y N(µ, σ 2 ) { 1 f (y; µ, σ) = exp 1 } (y µ)2 2πσ 2 2σ2 { } yµ 1 2 = exp µ2 σ 2 y 2 2σ log(2πσ2 ). (ii) Y Poisson(θ) f (y; θ) = e θ θ y y! = exp{y log θ θ log y!}. T. W. Yee (University of Auckland) Modern Regression Basics 27/167 October Cagliari 27 / 167

22 Generalized Linear Models (GLMs) The Exponential Family IV (iii) Z Binomial(m, p). For z = 0, 1,..., m, ( ) m P(Z = z; p) = p z (1 p) m z z { ( )} m = exp z log p + (m z) log(1 p) + log z { ( )} p m = exp z log + m log(1 p) + log. 1 p z If we look at the sample proportion, Y = Z/m, for y = 0, 1/m,..., m/m, P(Y = y; p) = P(Z = my; p) [ { p = exp m y log + log(1 p) 1 p } ( m + log my )]. T. W. Yee (University of Auckland) Modern Regression Basics 28/167 October Cagliari 28 / 167

23 Generalized Linear Models (GLMs) The Exponential Family V Let l(θ) = log L(θ) = log f (Y ; θ), I(θ) = 2 l(θ) θ 2. Can show and Var(Y ) = u(θ) = l(θ) θ, and E[Y ] = µ = b (θ) d (θ) φ 2 [ ω 2 d (θ) 2 ω b (θ) µd ] (θ), or φ (8) Var(Y ) = b (θ)φ d (θ) 2 ω φ b (θ)d (θ) ω {d (θ)} 3. (9) T. W. Yee (University of Auckland) Modern Regression Basics 29/167 October Cagliari 29 / 167

24 Generalized Linear Models (GLMs) The Exponential Family VI If the model is parameterized in terms of the natural parameter θ, (e.g., θ = log p 1 p instead of p) then d (θ ) = 1 and One then gets the following table. Var(Y ) = b (θ ) φ ω. (10) T. W. Yee (University of Auckland) Modern Regression Basics 30/167 October Cagliari 30 / 167

25 Generalized Linear Models (GLMs) The Exponential Family VII t Normal(µ, σ 2 ) Poisson(λ) 1 m Binomial(m, p) (Y = Sample proportion) f (y; θ, φ) 1 e 1 (y u) 2 2 σ 2 2πσ e λ λ y y! m my «p my (1 p) m my Range of Y (, ) 0, 1, 2,... 0, 1/m, 2/m,..., m/m Mean = E(Y ) µ µ = λ µ = p Usual parameter, θ µ λ p Natural param, θ µ log λ = log µ p log 1 p = log µ 1 µ b(θ) 1 2 µ 2 (= 1 2 θ 2 ) λ (= e θ ) log(1 p) = log(1 + e θ ) φ σ ω 1 1 m c(y, φ, ω) 1 2 ( y 2 φ + log(2πφ) ) log y! m log my «µ = E[Y ] µ λ (= e θ ) p = eθ Variance Function Constant (σ 2 ) µ (Var(Y ) as fn of µ) 1 + e θ µ(1 µ) T. W. Yee (University of Auckland) Modern Regression Basics 31/167 October Cagliari 31 / 167 m

26 Generalized Linear Models (GLMs) S and GLMs I In S use, e.g., glm(y x2 + x3 + x4, family=binomial, data=d) Family functions are gaussian(), binomial(), poisson(), Gamma(), inverse.gaussian(), quasi(). Generic functions include anova(), coef(), fitted(), plot(), predict(), print(), resid(), summary(), update(). Recall the Wilkinson and Rogers (1973) formula language, e.g., if f1 and f2 are factors and x1 and x2 are numeric, then f1 f2 1 + f1 + f2 + f1 : f2, f1/f2 f1 and then f2 within factor f1, x1 + x2 β 1 X 1 + β 2 X 2. Data frames to hold all the data. Columns are the variables. T. W. Yee (University of Auckland) Modern Regression Basics 33/167 October Cagliari 33 / 167

27 S and GLMs II Generalized Linear Models (GLMs) > data(nzc) > with(nzc, plot(year, female/(male + female), ylab = "Proportion", + main = "Proportion of NZ Chinese that are female", + col = "blue", las = 1)) > abline(h = 0.5, lty = "dashed") > fit.nzc = vglm(cbind(female, male) ~ year, fam = binomialff, + data = nzc) > with(nzc, lines(year, fitted(fit.nzc), col = "red")) T. W. Yee (University of Auckland) Modern Regression Basics 34/167 October Cagliari 34 / 167

28 S and GLMs III Generalized Linear Models (GLMs) Proportion of NZ Chinese that are female Proportion year T. W. Yee (University of Auckland) Modern Regression Basics 35/167 October Cagliari 35 / 167

29 S and GLMs IV Generalized Linear Models (GLMs) > with(nzc, plot(year, female/(male + female), ylab = "Proportion", + main = "Proportion of NZ Chinese that are female", + col = "blue", las = 1)) > abline(h = 0.5, lty = "dashed") > fit.nzc = vglm(cbind(female, male) ~ poly(year, + 2), fam = binomialff, data = nzc) Proportion of NZ Chinese that are female Proportion year T. W. Yee (University of Auckland) Modern Regression Basics 36/167 October Cagliari 36 / 167

30 Logistic regression I Generalized Linear Models (GLMs) > options(contrasts = c("contr.treatment", "contr.poly")) > y <- cbind(c(5, 20, 15, 10), c(20, 10, 10, 10)) > x <- 1:4 > fit <- glm(y ~ as.factor(x), family = binomial) > fit Call: glm(formula = y ~ as.factor(x), family = binomial) Coefficients: (Intercept) as.factor(x)2 as.factor(x)3 as.factor(x) Degrees of Freedom: 3 Total (i.e. Null); Null Deviance: Residual Deviance: 4.441e-15 AIC: Residual > exp(coef(fit)[-1]) T. W. Yee (University of Auckland) Modern Regression Basics 38/167 October Cagliari 38 / 167

31 Logistic regression II Generalized Linear Models (GLMs) as.factor(x)2 as.factor(x)3 as.factor(x) Here the model is where β 1 = 0. logit p(x) = β 0 + β j, j = 1, 2, 3, 4 T. W. Yee (University of Auckland) Modern Regression Basics 39/167 October Cagliari 39 / 167

32 Generalized Linear Models (GLMs) Logistic regression III If η(x) = β 0 + β 1 x then the log odds for a change in c units in x is obtained from the logit difference and the associated odds ratio is η(x + c) η(x) = c β 1 (11) ψ(c) = ψ(x + c, x) = exp(c β 1 ). (12) Example If logit P(D AGE) = AGE then an increase in age of 10 years will increase the odds of disease by exp(1.3) = T. W. Yee (University of Auckland) Modern Regression Basics 40/167 October Cagliari 40 / 167

33 Generalized Linear Models (GLMs) Logistic regression IV In general, for logit p(x) = β 0 + β T x (13) we have log ψ = log { } p(x1 )/(1 p(x 1 )) p(x 0 )/(1 p(x 0 )) Thus, for confidence intervals etc., use = β T (x 1 x 0 ). (14) Note: log ˆψ = (x 1 x 0 ) T Var(ˆβ) (x 1 x 0 ). (15) p(x 1 )/(1 p(x 1 )) p(x 0 )/(1 p(x 0 )) is the odds ratio for Y = 1 for a person with x 1 relative to a person with x 0. T. W. Yee (University of Auckland) Modern Regression Basics 41/167 October Cagliari 41 / 167

34 Generalized Linear Models (GLMs) Some extensions of GLMs Quasi-likelihood (Wedderburn, 1974) Composite link functions (Thompson and Baker, 1981) IRLS for other models (Green, 1984) Double Exponential Families (Efron, 1986) Generalized Estimating Equations (GEE; Liang and Zeger, 1986) Generalized linear mixed models (GLMMs) ANOVA Splines (Wahba and co-workers, 1995) Polychotomous Regression (Kooperberg and co-workers, 1997) Multivariate GLMs (Fahrmeir and Tutz, 2001) Heirarchial GLMs (Nelder and Lee, late 1990 s) Generalized additive models (GAMs; Hastie and Tibshirani, 1986) Generalized additive mixed models (GAMMs; Lin, 1998) Vector GLMs and VGAMs (Yee and Wild, 1996) T. W. Yee (University of Auckland) Modern Regression Basics 42/167 October Cagliari 42 / 167

35 Smoothing Smoothing Smoothing is a powerful tool for exploratory data analysis. It allows a data-driven approach rather than model-driven approach. Allows the data to speak for itself. Probably the central idea is localness, i.e., local behaviour versus global behaviour of a function. Scatterplot data (x i, y i ), i = 1,..., n. The classical smoothing problem is y i = f (x i ) + ε i, ε i (0, σ i ) (16) independently. Here, f is an arbitary smooth function, and i = 1,..., n. Q: How can f be estimated? A: If there is no a priori function form for f, one solution is the smoother. T. W. Yee (University of Auckland) Modern Regression Basics 43/167 October Cagliari 43 / 167

36 Smoothing Uses of Smoothing Smoothing has many uses, e.g., data visualization and EDA prediction derivative estimation, e.g., growth curves, acceleration used as a basis for many modern statistical techniques T. W. Yee (University of Auckland) Modern Regression Basics 45/167 October Cagliari 45 / 167

37 Example I Smoothing y x T. W. Yee (University of Auckland) Modern Regression Basics 47/167 October Cagliari 47 / 167

38 Example I Smoothing y x T. W. Yee (University of Auckland) Modern Regression Basics 49/167 October Cagliari 49 / 167

39 Smoothing There are four broad categories of smoothers: 1 series or regression smoothers (polynomials, Fourier regression, regression splines, filtering), 2 Kernel smoothers (N-W, locally weighted averages, local regression, loess), 3 Smoothing splines (roughness penalties), 4 Near neighbour smoothers (running means, medians, Tukey smoothers). We will look at kernel smoothers and splines. T. W. Yee (University of Auckland) Modern Regression Basics 50/167 October Cagliari 50 / 167

40 Smoothing Scatterplot data (y i, x i ), i = 1,..., n. The classical smoothing problem is where Y i = f (X i ) + ε i (17) f = a smooth function estimated from the data, E(ε i ) = 0, Var(ε i ) = σ 2 independently. We let Var(ε i ) w 1 i (known), written Var(ε) = W 1, W = diag(w 1,..., w n ) = Σ 1. WLOG let the data is ordered so that x 1 < x 2 < < x n. T. W. Yee (University of Auckland) Modern Regression Basics 52/167 October Cagliari 52 / 167

41 Kernel Smoothers I Nadaraya-Watson Estimator Smoothing Kernel regression estimators are well-known, easily understood and mathematically tractable. The Nadaraya-Watson (N-W) estimator estimates f (x) by ( ) n x xi n K y i K h (x x i ) y i i=1 h i=1 ˆf nw (x) = ( ) = n x xi n K K h (x x i ) h i=1 i=1 (18) where ( u ) K h (u) = h 1 K h. (19) T. W. Yee (University of Auckland) Modern Regression Basics 54/167 October Cagliari 54 / 167

42 Kernel Smoothers II Nadaraya-Watson Estimator Smoothing K, a symmetric unimodal function about 0 which integrates to unity, creates the local averaging of values of y i whose corresponding values of x i are close to the point of estimation x. The amount of smoothing is controlled by the bandwidth h. Some popular kernel functions are given in Table 2. As h decreases, the bias decreases and the variance increases. In practice, the choice of bandwidth h is more crucial than the choice of kernel function. T. W. Yee (University of Auckland) Modern Regression Basics 55/167 October Cagliari 55 / 167

43 Kernel Smoothers III Nadaraya-Watson Estimator Smoothing 3.0 Regression function T. W. Yee (University of Auckland) Modern Regression Basics 56/167 October Cagliari 56 / 167

44 Kernel Smoothers IV Nadaraya-Watson Estimator Smoothing Table: Popular kernel functions. Nb. the quartic is also known as the biweight. Kernel K(u) 1 Uniform 2 I ( u 1) Triangle (1 u ) I ( u 1) 3 Epanechnikov 4 (1 u2 ) I ( u 1) 15 Quartic 16 (1 u2 ) 2 I ( u 1) 70 Tricube 81 (1 u 3 ) 3 I ( u 1) 35 Triweight 32 (1 u2 ) 3 I ( u 1) Gaussian exp( 1 2 u2 )/ 2π Cosinus cos(πu/2) I ( u 1) π 4 T. W. Yee (University of Auckland) Modern Regression Basics 57/167 October Cagliari 57 / 167

45 Smoothing Local Regression I Theoretically elegant and also called local polynomial kernel estimators, it has favourable asymptotic properties and boundary behaviour. Idea: estimate f (x 0 ) by locally fitting a rth degree polynomial to data via weighted least squares (WLS). Example: a local linear kernel estimate for f (x i ) = 2 exp( x 2 i /0.3 2 ) + 3 exp( (x i 1) 2 /0.7 2 ) + ε i, x i = (i 1)/n, ε i N(0, σ = 0.115) independently. T. W. Yee (University of Auckland) Modern Regression Basics 59/167 October Cagliari 59 / 167

46 Local Regression II Smoothing 3.0 Regression function Figure: Local linear kernel estimate (solid red) of the regression function f given in the text based on 100 simulated observations (crosses). The solid black curve is the true function. The red dashed curves are the kernel weights. T. W. Yee (University of Auckland) Modern Regression Basics 60/167 October Cagliari 60 / 167

47 Smoothing We now derive an explicit expression for the local polynomial kernel estimator. Let r be the degree of the polynomial being fitted. At a point x, the estimator ˆf (x; r, h) is obtained by fitting the polynomial β 0 + β 1 ( x) + + β r ( x) r to the (x i, y i ) using WLS with kernel weights K h (x i x). The value of ˆf (x; r, h) is the height of the fit ˆβ0, where ˆβ = ( ˆβ 0,..., ˆβ r ) T minimizes n i=1 {y i β 0 β 1 (x i x) β r (x i x) r } 2 K h (x i x). (20) The solution is ˆβ = (X T x W x X x ) 1 X T x W x y (21) T. W. Yee (University of Auckland) Modern Regression Basics 62/167 October Cagliari 62 / 167

48 Smoothing where y = (y 1,..., y n ) T, 1 (x 1 x)... (x 1 x) r X x =... 1 (x n x)... (x n x) r is n (r + 1), and W x =Diag(K h (x 1 x),..., K h (x n x)). Since the estimator of f (x) is the intercept, we have ˆf (x; r, h) = e T 1 (X T x W x X x ) 1 X T x W x y. (22) T. W. Yee (University of Auckland) Modern Regression Basics 63/167 October Cagliari 63 / 167

49 Smoothing Simple explicit formulae exist for the N-W estimator (r = 0): ˆf (x; 0, h) = n K h (x i x) y i i=1 n K h (x i x) i=1 (23) and the local linear estimator (r = 1): ˆf (x; 1, h) = n 1 n i=1 {ŝ 2 (x; h) ŝ 1 (x; h)(x i x)} K h (x i x) y i ŝ 2 (x; h)ŝ 0 (x; h) ŝ 1 (x; h) 2 (24) where n ŝ r (x; h) = n 1 (x i x) r K h (x i x). (25) i=1 T. W. Yee (University of Auckland) Modern Regression Basics 65/167 October Cagliari 65 / 167

50 Derivative Estimation I Smoothing Uses include the study of human growth curves where the first two derivatives of height as a function of age ( speed and acceleration of growth) have important biological significance. The extension of local polynomial ideas to estimate the νth derivative is straightforward. One can estimate f (ν) (x) via the intercept coefficient of the νth derivative of the local polynomial being fitted at x, assuming ν r. In general, ˆf (ν) (x; r, h) = ν! e T ν+1(x T x W x X x ) 1 X T x W x y, for all ν = 0,..., r (26) from (22). Note that ˆf (ν) (x; r, h) is not in general equal to the νth derivative of ˆf (x; r, h). T. W. Yee (University of Auckland) Modern Regression Basics 67/167 October Cagliari 67 / 167

51 Derivative Estimation II Smoothing Choosing r In the early 1990 s Fan and co-workers showed that, for estimating f (ν) (x), there is no increase in variability when passing from an even (i.e., r ν even) r = ν + 2q order fit to an odd r = ν + 2q + 1 order fit, but when passing from an odd r = ν + 2q + 1 order fit to the consecutive even r = ν + 2q + 2 order there is a price to be paid in terms of increased variability. Therefore, even order fits r = ν + 2q are not recommended. Fan and Gijbels (1996) recommend using the lowest odd order, i.e., r = ν + 1, or occasionally r = ν + 3. For f choose r = 1 (maybe 3)... For f choose r = 2 (maybe 4)... T. W. Yee (University of Auckland) Modern Regression Basics 68/167 October Cagliari 68 / 167

52 Lowess and Loess I Smoothing A popular method based on local regression is Lowess (Cleveland, 1979) and Loess (Cleveland and Devlin, 1988). Lowess = locally weighted scatterplot smoother, and it robustifies the locally WLS method above. The basic idea is to fit a polynomial of degree r locally via (20) and obtain the fitted values. Then calculate the residuals and assign weights to each residual: large/small residuals receive small/large weights respectively. Then perform another local polynomial fit of order r with weights given by the product of the initial weight and new weight. Thus observations showing large residuals at the initial fit are downweighted in the second fit. The above process is repeated a number of times. Cleveland (1979) recommended r = 1 and 3 iterations (default). T. W. Yee (University of Auckland) Modern Regression Basics 70/167 October Cagliari 70 / 167

53 Lowess and Loess II Smoothing > par(mfrow = c(2, 2), mar = c(5, 4, 2, 1) + 0.1) > set.seed(761) > x <- sort(rnorm(100)) > eps <- rnorm(100, 0, 0.1) > y <- sin(x) + eps > plot(x, y, col = "blue", pch = 4) > title("default: lowess(x, y)", cex = 0.5) > lo1 <- lowess(x, y) > lines(lo1, lty = 1, col = "red") > plot(x, y, col = "blue", pch = 4) > title("lowess(x, y, f=0.5)", cex = 0.5) > lo2 <- lowess(x, y, f = 0.5) > lines(lo2, lty = 1, col = "red") > plot(x, y, col = "blue", pch = 4) > title("lowess(x, y, f=0.2)", cex = 0.5) > lo3 <- lowess(x, y, f = 0.2) > lines(lo3, lty = 1, col = "red") > plot(x, y, col = "blue", pch = 4) > title("default: loess(y ~ x)", cex = 0.5) T. W. Yee (University of Auckland) Modern Regression Basics 71/167 October Cagliari 71 / 167

54 Lowess and Loess III Smoothing > lo4 <- loess(y ~ x) > lines(x, fitted(lo4), lty = 1, col = "red") Default: lowess(x, y) lowess(x, y, f=0.5) y 0.0 y x x lowess(x, y, f=0.2) Default: loess(y ~ x) y 0.0 y x x T. W. Yee (University of Auckland) Modern Regression Basics 72/167 October Cagliari 72 / 167

55 Lowess and Loess IV Smoothing Once again choosing a good bandwidth is crucial. T. W. Yee (University of Auckland) Modern Regression Basics 73/167 October Cagliari 73 / 167

56 Local Likelihood I Smoothing Local likelihood replaces the local least squares criterion by an appropriate local log-likelihood criterion. Example: for binary data (x i, y i ), i = 1,..., n, y i = 0 or 1, the local log-likelihood is n i=1 ( ) xi x K {y i log p i + (1 y i ) log (1 p i )} (27) h where p i = p(x i ) = P(Y = 1 x i ). We could model p(x) directly using local polynomials, however, it is usually preferable to use θ(x) = logit p(x). We approximate θ(x) locally by a polynomial, then choose the polynomial coefficients to maximize the likelihood. T. W. Yee (University of Auckland) Modern Regression Basics 75/167 October Cagliari 75 / 167

57 Local Likelihood II Smoothing Local likelihood can also be applied to other regression models and density estimation. Local likelihood developed by Tibshirani (1984). A good book on the topic is Loader (1999). T. W. Yee (University of Auckland) Modern Regression Basics 76/167 October Cagliari 76 / 167

58 Smoothing Regression Splines I Idea: fit a higher degree polynomial (polynomial regression.) Some drawbacks: Polynomials aren t very local but have a global nature. So usually are not ok at the boundaries, especially if the degree of the polynomial is high [cf. Stone-Weierstrass Theorem]; Individual observations can have a large influence on remote parts of the curve; The polynomial degree cannot be controlled continuously. Polynomial regression can be fitted using the poly() function, e.g., > fit <- lm(y ~ poly(x, 5)) fits a 5th degree polynomial. T. W. Yee (University of Auckland) Modern Regression Basics 78/167 October Cagliari 78 / 167

59 Regression Splines II Smoothing Regression splines use a piecewise polynomial. The regions are separated by knots (or breakpoints). The positions where each pair of segments join are called joints. The more knots, the more flexible the family of curves become. It is customary to force the piecewise polynomials to join smoothly at these knots. A popular choice are piecewise cubic polynomials with continuous 0th, 1st and 2nd derivatives called cubic splines. Using splines of degree > 3 seldom yields any advantage. Given a set of knots, the smooth is computed by multiple regression on a set of basis vectors. T. W. Yee (University of Auckland) Modern Regression Basics 79/167 October Cagliari 79 / 167

60 Regression Splines III Smoothing Here s a regression spline. > pos <- function(x) ifelse(x > 0, x, 0) > x <- 1:7 > y <- c(8, 3, 8, 5, 9, 14, 11) > knot <- 4 > plot(x, y, col = "blue") > X <- cbind(1, x, x^2, x^3, pos(x - knot)^3) > fit <- lm(y ~ X - 1) > xx <- seq(1, 7, length = 200) > XX <- cbind(1, xx, xx^2, xx^3, pos(xx - knot)^3) > lines(xx, XX %*% coef(fit)) > abline(v = knot, lty = "dashed", col = "purple") > X T. W. Yee (University of Auckland) Modern Regression Basics 80/167 October Cagliari 80 / 167

61 Regression Splines IV Smoothing x [1,] [2,] [3,] [4,] [5,] [6,] [7,] T. W. Yee (University of Auckland) Modern Regression Basics 81/167 October Cagliari 81 / 167

62 Regression Splines V Smoothing y x T. W. Yee (University of Auckland) Modern Regression Basics 82/167 October Cagliari 82 / 167

63 Regression Splines VI Smoothing Definitions: A function f C k [a, b] if derivatives f, f,..., f (k) all exist and are continuous in [a, b], e.g., x / C 1 [a, b]. Notes: 1 f C k [a, b] = f C k 1 [a, b]. 2 C[a, b] C 0 [a, b] = {f (t) : f (t) continuous and real valued, a t b}. There are at least two bases for cubic splines: 1 truncated power series easier to understand but is not used in practice, 2 B-splines harder to understand but is used in practice. T. W. Yee (University of Auckland) Modern Regression Basics 83/167 October Cagliari 83 / 167

64 Smoothing Regression Splines VII Advantages of regression splines: computationally and statistically simple, standard parametric inferences are available. For example, testing whether a knot can be removed and the same polynomial equation used to explain two adjacent segments can be tested by H 0 : θ j = 0, which is one of the t-tests statistics always printed by a regression program. Disadvantages of regression splines: difficult to choose the number of knots, difficult to choose the position of the knots, the smoothness of the estimate cannot be varied continuously as a function of a single smoothing parameter. T. W. Yee (University of Auckland) Modern Regression Basics 84/167 October Cagliari 84 / 167

65 Smoothing Regression Splines VIII Here is a more formal definition of a spline. In mathematics, a spline denotes a function s(x) which is essentially a piecewise polynomial over an interval (a, b), such that a certain number of its derivatives are continuous for all points in (a, b). More precisely, s(x) is a spline of degree r (some given positive integer) with knots ξ 1,..., ξ K (such that a < ξ 1 < ξ 2 < < ξ K < b) if it satisfies the following properties: for any subinterval (ξ j, ξ j+1 ), s(x) is a polynomial of degree r; (order r + 1); s (x),..., s (r 1) (x) are continuous (derivatives), i.e., s C r 1 (a, b); the rth derivative of s(x) is a step function with jumps at ξ 1,..., ξ K. Often r is chosen to be 3, and the term cubic spline is then used for the associated curve. T. W. Yee (University of Auckland) Modern Regression Basics 85/167 October Cagliari 85 / 167

66 Smoothing Regression Splines IX Wold (1974), in a paper reflecting a lot of experience fitting regression splines, made the following recommendations when using cubic splines: 1 Knot points should be located at data points, 2 Have as few knots as possible, ensuring that a minimum of 4 or 5 observations should fall between knot points, 3 No more than one extremum point and one inflexion point should fall between knots (because a cubic is not possible of approximating more variations), 4 Extrema should be centered in intervals and inflexion points should be located near knot points. T. W. Yee (University of Auckland) Modern Regression Basics 86/167 October Cagliari 86 / 167

67 Smoothing B-Splines I B-splines form a numerically stable basis for splines. It is convenient to consider splines of a general order, M say. 1 M = 4: cubic spline. 2 M = 3: quadratic spline which has continuous derivatives up to order M 2 = 1 at the knots this is aka a parabolic spline. 3 M = 2: linear spline which has continuous derivatives up to order M 2 = 0 at the knots i.e., the function is continuous. Let ξ 0 (< ξ 1 ) and ξ K+1 (> ξ K ) be 2 boundary knots. Define the augmented knot sequence {τ } such that τ 1 τ 2 τ M ξ 0 ; τ j+m = ξ j, j = 1,..., K; ξ K+1 τ K+M+1 τ K+2M. T. W. Yee (University of Auckland) Modern Regression Basics 88/167 October Cagliari 88 / 167

68 B-Splines II Smoothing The actual values of these additional knots beyond the boundary are arbitrary, and it is customary to make them all the same and equal to ξ 0 and ξ K+1 respectively. Denote by B i,m (x) the ith B-spline basis function of order m for the knot sequence {τ }, m M. They are defined recursively as follows: For i = 1,..., K + 2M 1, B i,1 (x) = { 1, τi x < τ i+1, 0, otherwise; (28) T. W. Yee (University of Auckland) Modern Regression Basics 89/167 October Cagliari 89 / 167

69 Smoothing B-Splines III Then for i = 1,..., K + 2M m, B i,m (x) = x τ i B i,m 1 (x) + τ i+m x B i+1,m 1 (x) (29) τ i+m 1 τ i τ i+m τ i+1 (de Boor, 1978). He showed stable and efficient recursive algorithms for computing them. Thus with m = 4, B i,4, i = 1,..., K + 4, are the K + 4 cubic B-spline basis functions for the knot sequence {ξ}. This recursion can be continued and will generate the B-spline basis for any order spline. T. W. Yee (University of Auckland) Modern Regression Basics 90/167 October Cagliari 90 / 167

70 B-Splines IV Smoothing > knots <- c(1:3, 5, 7, 8, 10) > atx <- seq(0, 11, by = 0.01) > mycol = (1:(22 + 1))[-7] > for (ord in 2:5) { + B <- bs(x = atx, degree = ord - 1, knots = knots, + intercept = TRUE) + matplot(atx, B[, 1], type = "l", ylim = 0:1, + lty = 2, ylab = "", xlab = "") + matlines(atx, B[, -1], col = mycol, lty = 1) + title(paste("b-splines of order", ord)) + abline(v = knots, lty = 2, col = "purple") + } > attr(b, "degree") [1] 4 > attr(b, "knots") [1] T. W. Yee (University of Auckland) Modern Regression Basics 91/167 October Cagliari 91 / 167

71 B-Splines V Smoothing > attr(b, "Boundary.knots") [1] 0 11 > attr(b, "intercept") [1] TRUE > attr(b, "class") [1] "bs" "basis" T. W. Yee (University of Auckland) Modern Regression Basics 92/167 October Cagliari 92 / 167

72 Smoothing B-Splines VI B splines of order 2 B splines of order B splines of order 4 B splines of order T. W. Yee (University of Auckland) Modern Regression Basics 93/167 October Cagliari 93 / 167

73 B-Splines VII Smoothing In general, bs() adds ord boundary knots to each end, where the boundary knot values are min(x i ) and max(x i ). If intercept=false then the left-most function/column is omitted. T. W. Yee (University of Auckland) Modern Regression Basics 94/167 October Cagliari 94 / 167

74 Smoothing B-Splines VIII To illustrate that linear combinations of the B-spline basis functions do accommodate smooth curves, > matplot(atx, 2 * B[, 3] - 5 * B[, 4] + 3 * B[, + 7], type = "l", lwd = 2, col = "blue") 2 * B[, 3] 5 * B[, 4] + 3 * B[, 7] atx T. W. Yee (University of Auckland) Modern Regression Basics 95/167 October Cagliari 95 / 167

75 Smoothing B-Splines IX Here are some additional notes: 1 > args(bs) function (x, df = NULL, knots = NULL, degree = 3, intercept = FALSE, Boundary.knots = range(x)) NULL In fact, in Value:, df should be length(knots) + degree + intercept. 2 Safe prediction not as good as smart prediction, e.g., I(bs(x)), poly(scale(x), 2). T. W. Yee (University of Auckland) Modern Regression Basics 96/167 October Cagliari 96 / 167

76 B-Splines X Smoothing 3 B-splines are actually defined by means of divided differences. B i,m, which is based on knots τ i,..., τ i+m, is defined as B i,m (x) = (τ i+m τ i ) Equation (29) follows from this. i+m j=i (x τ j ) m 1 + i+m s=i,s j. (30) (τ j τ s ) T. W. Yee (University of Auckland) Modern Regression Basics 97/167 October Cagliari 97 / 167

77 B-Splines XI Smoothing As an illustration, > library(splines) > n <- 50 > set.seed(760) > knots <- 1:5 > x <- seq(0, 2 * pi, length = n) > y <- sin(x) + rnorm(n, sd = 0.5) > plot(x, y, col = "blue", pch = 4) > fit <- lm(y ~ bs(x, knots = knots)) > abline(v = knots, lty = "dashed", col = "purple") > lines(x, sin(x), col = "black") > aknots = c(-inf, knots, Inf) > for (ii in 2:length(aknots)) { + newx = seq(max(aknots[ii - 1], min(x)), min(aknots[ii], + max(x)), len = 200) + lines(newx, predict(fit, data.frame(x = newx)), + col = ii - 1, lwd = 2) + } T. W. Yee (University of Auckland) Modern Regression Basics 98/167 October Cagliari 98 / 167

78 B-Splines XII Smoothing Overall, the fit is ok, but could be improved by decreasing the number of knots and heeding the recommendations of Wold (1974) y x T. W. Yee (University of Auckland) Modern Regression Basics 99/167 October Cagliari 99 / 167

79 B-Splines XIII Smoothing Knots with varying multiplicities have an effect illustrated by the following Multiplicity Multiplicity Multiplicity Multiplicity T. W. Yee (University of Auckland) Modern Regression Basics October Cagliari 100 / 167

80 Natural Splines I Smoothing A cubic spline on [a, b] is a natural cubic splines (NCS) if its 2nd and 3rd derivatives are 0 at a and b (natural boundary conditions). Natural splines, a restricted form of B-splines, has been implemented by the function ns(). Given knots ξ 1,..., ξ K, ns() is linear on (, ξ 0 ] and [ξ K+1, ) where ξ 0 and ξ K+1 are two extra knots. ns() chooses these to be the minimum and maximum of the x i respectively. The result is K + 2 parameters. T. W. Yee (University of Auckland) Modern Regression Basics October Cagliari 102 / 167

81 Natural Splines II Here s an example. Smoothing > set.seed(21) > nn = 20 > x = seq(0, 1, len = nn) > y = runif(nn) > myknots = c(0.3, 0.7) > plot(x, y, xlim = c(-0.5, 1.5), col = "blue") > fit = lm(y ~ ns(x, knot = myknots)) > newx = seq(-0.5, 2.5, len = 100) > lines(newx, predict(fit, data.frame(x = newx)), + col = "blue") > abline(v = c(range(x), myknots), col = "purple", + lty = "dashed") > coef(fit) T. W. Yee (University of Auckland) Modern Regression Basics October Cagliari 103 / 167

82 Natural Splines III Smoothing (Intercept) ns(x, knot = myknots) ns(x, knot = myknots)2 ns(x, knot = myknots) T. W. Yee (University of Auckland) Modern Regression Basics October Cagliari 104 / 167

83 Natural Splines IV Smoothing y x T. W. Yee (University of Auckland) Modern Regression Basics October Cagliari 105 / 167

84 Smoothing Smoothing splines I Cubic smoothing splines minimize S(f ) = n (y i f (x i )) 2 + λ i=1 b a {f (x)} 2 dx, (31) over a Sobolev space of order 2. Here, a < x 1 < < x n < b for some a and b, and λ 0. The terms of S(f ): 1 The first penalizes lack-of-fit; 2 the second penalizes wiggliness. These two conflicting quantities are weighted by the non-negative smoothing parameter λ. T. W. Yee (University of Auckland) Modern Regression Basics October Cagliari 107 / 167

85 Smoothing splines II Smoothing Larger values of λ produce more smoother curves. As λ, f (x) 0 and the solution is a least squares line. As λ 0, the solution tends to an interpolating twice-differentiable function. (31) fits into the penalty function approach (Green and Silverman, 1994). Penalized least squares minimizes (y f) T Σ 1 (y f) + f T Kf. Solution: f = A(λ) y where A(λ) = (I n + ΣK) 1 is the influence or smoother matrix. T. W. Yee (University of Auckland) Modern Regression Basics October Cagliari 108 / 167

86 Some notes I Smoothing Here are some notes: 1 The smoothing parameter λ can be regarded as the turning knob which controls the tradeoff between fidelity to the data and smoothness. Can select λ by trial and error. 2 The justification of the penalty term by physics (energy b a curvature2 ), which b a f (t) 2 dt. Hooke s Law. 3 Importantly, Reinsch (1967) showed, using the calculus of variations, that the solution of (31) is a cubic spline with knots at the unique values of the x i. It can be shown that minimizing (31) is equivalent to minimizing b a {f (x)} 2 dx subject to 4 As n, λ should become smaller. n i=1 {y i f (x i )} 2 σ. T. W. Yee (University of Auckland) Modern Regression Basics October Cagliari 110 / 167

87 Smoothing Some notes II 5 There are alternative regularizations, e.g., b a f (x) 2 dx (32) whose solution is a linear spline. In general, b a f (ν) (x) 2 dx produces a spline of degree 2ν 1. Note we never get an even degree spline not unless fractional derivatives are used. 6 S 2 [a, b] is actually a Sobolev space of order 2. In general, a Sobolev space of order m is W m 2 [a, b] = {f : f (j), j = 0,..., m 1, is absolutely continuous on [a, b], f (m) L 2 [a, b]}, i.e., b a {f (m) (t)} 2 dt <. T. W. Yee (University of Auckland) Modern Regression Basics October Cagliari 111 / 167

88 Smoothing Some notes III How can we compute a cubic smoothing spline? There are several ways: 1 Direct method. Not recommended (O(n 3 )). 2 State-space approach (O(n)). 3 B-splines this is a numerically stable method (O(n)). 4 Reinsch algorithm (O(n)). T. W. Yee (University of Auckland) Modern Regression Basics October Cagliari 112 / 167

89 Smoothing > args(smooth.spline) function (x, y = NULL, w = NULL, df, spar = NULL, cv = FALSE, all.knots = FALSE, nknots = NULL, keep.data = TRUE, df.offset = 0, penalty = 1, control.spar = list()) NULL For basic use, use the arguments df. Here s an example. > data(cars) > with(cars, plot(speed, dist, main = "data(cars) & smoothing splines")) > cars.spl <- with(cars, smooth.spline(speed, dist)) > cars.spl Call: smooth.spline(x = speed, y = dist) Smoothing Parameter spar= lambda= (11 iterations) Equivalent Degrees of Freedom (Df): Penalized Criterion: GCV: T. W. Yee (University of Auckland) Modern Regression Basics October Cagliari 114 / 167

90 Smoothing This example has duplicate points, so avoid cv=true. > lines(cars.spl, col = "blue") > with(cars, lines(smooth.spline(speed, dist, df = 10), + lty = 2, col = "red", lwd = 2)) > with(cars.spl, legend(5, 120, c(paste("default [C.V.] => df =", + round(df, 1)), "s( *, df = 10)"), col = c("blue", + "red"), lty = 1:2, lwd = 1:2, bg = "bisque")) data(cars) & smoothing splines default [C.V.] => df = 2.6 s( *, df = 10) dist speed T. W. Yee (University of Auckland) Modern Regression Basics October Cagliari 115 / 167

91 Smoothing Smoothing Some General Theory I In scatterplot smoothing, there is a fundamental trade-off between the bias and variance of the estimate, and this phenomenon is governed by the smoothing parameter. An optimal choice of span would trade the bias against the variance. One such criterion is the mean square error (MSE): [ ] 2 ) ( 2 E (ˆf k (x i ) f (x i )) = Var (ˆf k (x i ) + E ˆf k (x i ) f (x i )). T. W. Yee (University of Auckland) Modern Regression Basics October Cagliari 117 / 167

92 Linear Smoothers I Smoothing A smoother is linear if S(a y 1 + b y 2 x) = a S(y 1 x) + b S(y 2 x) (33) for any constants a and b. That is, where S does not depend on y. ŷ = Sy (34) S is referred to as the influence (or smoother) matrix. Examples: For the bin, running-mean, running-line, regression spline, cubic spline, kernel and local polynomial kernel smoothers are all linear smoothers (with a fixed smoothing parameter). T. W. Yee (University of Auckland) Modern Regression Basics October Cagliari 119 / 167

93 Smoothing Linear Smoothers II The theory for linear smoothers is much simpler than for nonlinear smoothers. Many properties of smoothers seen by the eigenvalues and eigenvectors of S. For example, for a cubic spline, S(λ) has all eigenvalues values in (0, 1] with exactly two unit eigenvalues with corresponding eigenvectors 1 and x. That is, S 1 = 1 1 and S x = 1 x. These correspond to constant and linear functions. T. W. Yee (University of Auckland) Modern Regression Basics October Cagliari 120 / 167

94 Linear Smoothers III Smoothing 1e 01 1e 03 1e 05 Figure: Eigenvalues of a cubic smoothing spline. T. W. Yee (University of Auckland) Modern Regression Basics October Cagliari 121 / 167

95 Degrees of Freedom I Smoothing All smoothers allow the user to vary the amount of smoothing done via the smoothing parameter, e.g., bandwidth, the span, or λ. However, it would be useful to have some measure of the amount of smoothing done. One such measure is the effective degrees of freedom (EDF) of a smooth. It is useful for a number of reasons, e.g., comparing different types of smoothers while keeping the amount of smoothing roughly equal. The theory of EDF is a natural extension of standard results from the general linear model Recall that, if β is p 1, that Y = Xβ + ε, Var(ε) = σ 2 I. T. W. Yee (University of Auckland) Modern Regression Basics October Cagliari 123 / 167

96 Smoothing Degrees of Freedom II 1 Ŷ = Py where P = X(X T X) 1 X T is idempotent of rank p. Then trace(p) = p, 2 trace(var(ŷ)) = σ 2 trace(pp T ) = σ 2 p, 3 E[(n p) S 2 ] = E[ResSS] = E[(Y Ŷ) T (Y Ŷ)] = σ 2 (n p). By replacing P by S, these results suggest the following three definitions for the effective degrees of freedom of a smooth: 1 df = trace(s), 2 df var = trace(ss T ) and 3 df err = n trace(2s SS T ). More generally, with weights W, these are 1 df = trace(s), 2 df var = trace(wsw 1 S T ) and 3 df err = n trace(2s S T WSW 1 ). T. W. Yee (University of Auckland) Modern Regression Basics October Cagliari 124 / 167

97 Smoothing Degrees of Freedom III It can be shown that if S is a symmetric projection matrix then trace(s), trace(2s SS T ) and trace(ss T ) coincide. For cubic splines, it can be shown that trace(ss T ) trace(s) trace(2s SS T ) and that all three of these functions are decreasing in λ. Notes: 1 df is the most popular and easiest to compute. Cost of df is O(n) for most smoothers. T. W. Yee (University of Auckland) Modern Regression Basics October Cagliari 125 / 167

98 Degrees of Freedom IV Smoothing 2 2 the degrees of freedom the number of distinct x i. Linear fit = 2. The number of distinct x i = interpolant. As the degrees of freedom increases the fit becomes more wiggly. A smooth with 3 degrees of freedom has approximately the same flexibility as a quadratic. A value of 4 or 5 degrees of freedom is often used as the default value in software, as this can accommodate a reasonable amount of nonlinearity without being excessive. T. W. Yee (University of Auckland) Modern Regression Basics October Cagliari 126 / 167

99 Standard Errors I Smoothing ˆf = S y = Var(ˆf) = σ 2 SS T (35) Can form pointwise SE bands for ˆf (useful in preventing the over-interpretation of a plot of the estimated function). But (35) impractical if n is large (all of S needed). Trick: for cubic splines, Silverman (1985) uses a Bayesian derivation to discuss the use of the alternative Cost is O(n) σ 2 S. T. W. Yee (University of Auckland) Modern Regression Basics October Cagliari 128 / 167

100 Equivalent Kernels I Smoothing Consider ŷ = Sy for a linear smoother. Plotting the jth row of S versus x i gives the weights used for the estimate ŷ j. This mimics the kernel function of a kernel smoother. The EK for a cubic spline is κ(u) = 1 ( 2 exp u ) ( u sin + π ) as n (Silverman, 1984). T. W. Yee (University of Auckland) Modern Regression Basics October Cagliari 130 / 167

101 Equivalent Kernels II Smoothing EK u If the design points x i have a local density g(x), and if x is not too near the boundary and λ is not too big or too small, then the local bandwidth h(x) satisfies h(x) = {λ n g(x)} 1/4. T. W. Yee (University of Auckland) Modern Regression Basics October Cagliari 131 / 167

102 Smoothing Automatic Smoothing Parameter Selection I Choosing the bandwidth/smoothing parameter is the most important decision for a specified method. We want an automatic way of choosing the right smoothing parameter. A popular method is cross-validation (CV), and restrict attention to linear smoothers. CV idea: leave point (x i, y i ) out one at a time and estimating the smooth at x i based on the remaining n 1 points. Choose λ CV to minimize the cross-validation sum of squares CV (λ) = 1 n n i=1 { y i ˆf i λ (x i)} 2 (36) T. W. Yee (University of Auckland) Modern Regression Basics October Cagliari 133 / 167

103 Smoothing Automatic Smoothing Parameter Selection II where ˆf i λ (x i) is the fitted value at x i, computed by leaving out the ith data point. One can compute (36) naïvely. But there is a trick. Define ˆf i λ (x i) to be the fit obtained by setting the weight of the ith observation to zero, and increasing the remaining weights so that they sum to unity, i.e., This means ˆf i λ (x i) = ˆf i λ (x i) = n j=1 j i n j=1 j i s ij 1 s ii y j. (37) s ij y j + s ii ˆf i λ (x i) (38) T. W. Yee (University of Auckland) Modern Regression Basics October Cagliari 134 / 167

104 Smoothing Automatic Smoothing Parameter Selection III and y i ˆf i λ (x i) = y i ˆf λ (x i ) 1 s ii. (39) Thus, CV (λ) can be written CV (λ) = 1 n { } 2 n y i ˆf λ (x i ). (40) 1 s ii (λ) i=1 So there is no need to compute ˆf i λ (x i) naïvely. In practice, CV sometimes gives questionable performance. T. W. Yee (University of Auckland) Modern Regression Basics October Cagliari 135 / 167

105 Smoothing Generalized Cross-Validation A variant of CV (λ) is generalized cross-validation (GCV). The GCV idea is to replace s ii by its average value trace(s)/n which is easier to compute: GCV (λ) = 1 n GCV tends to undersmooth. { } 2 n y i ˆf λ (x i ). 1 trace(s)/n i=1 T. W. Yee (University of Auckland) Modern Regression Basics October Cagliari 136 / 167

106 Testing for nonlinearity Smoothing Suppose we wish to compare two smooths ˆf 1 = S 1 y and ˆf 2 = S 2 y. For example, the smooth ˆf 2 might be rougher than ˆf 1, and we wish to test if it picks up any significant bias. A standard case that often arises is when ˆf 1 is linear, in which case we want to test if the linearity is real. We must assume that ˆf 2 is unbiased, and that ˆf 1 is unbiased under H 0. Letting ResSS j be the residual sum of squares for the jth smooth and γ j be trace(2s S T S), then (ResSS 1 ResSS 2 )/(γ 2 γ 1 ) ResSS 2 /(n γ 1 ) F γ2 γ 1,n γ 1 (41) approximately. T. W. Yee (University of Auckland) Modern Regression Basics October Cagliari 137 / 167

107 Smoothing The Curse of Dimensionality I Sometimes multidimensional smoothers can work with a moderate number of inputs. But the curse of dimensionality hinders them in higher dimensions: local neighbourhoods are empty, or nearest-neighbourhoods are not local all points are close to the boundary sample sizes need to grow exponentially. T. W. Yee (University of Auckland) Modern Regression Basics October Cagliari 139 / 167

108 Smoothing The Curse of Dimensionality II That is, neigbourhoods with a fixed number of points become less local as the dimensions increase. For fixed n, the data becomes more isolated in d-space and smoothers require a larger neighbourhood to find enough data points in order to calculate the variance of an estimate. Hence the estimate is no longer local and can be severely biased. The following illustrates the curse of dimensionality. Suppose we have data uniformly distributed in a d dimensional unit cube. We spread out a subcube from the origin to capture span% of the data. What distance do we have to reach out on each axis? The next figure gives the answer. Most reasonable high-dimensional procedures assume some structure. T. W. Yee (University of Auckland) Modern Regression Basics October Cagliari 140 / 167

109 Smoothing The Curse of Dimensionality III d=10 d=3 d=2 d=1 distance span (%) Figure: Distance on each axis of a subcube required to capture span% of the data of a d dimensional unit cube. T. W. Yee (University of Auckland) Modern Regression Basics October Cagliari 141 / 167

110 Generalized Additive Models (GAMs) Generalized Additive Models (GAMs) I For general p, the linear model is Y = β 1 X β p X p + ε, ε N(0, σ 2 ) independently. (42) This model has some strong assumptions: 1 Linearity, i.e., the effect of each X k on E(Y ) is linear, 2 Normal errors with zero mean, constant variance, and independent, 3 Additivity, i.e., X k and X l do not interact; they have an additive effect on the response. T. W. Yee (University of Auckland) Modern Regression Basics October Cagliari 143 / 167

111 Generalized Additive Models (GAMs) Generalized Additive Models (GAMs) II We relax the linearity assumption. The linear predictor becomes an additive predictor: a sum of arbitary smooth functions. η(x) = f 1 (x 1 ) + + f p (x p ), (43) Additivity is still assumed. Easy to interpret. Identifiability: the f k (x k ) are centred. Very useful for exploratory data analysis. Allows the data to speak for itself. Some GAM books are Hastie and Tibshirani (1990) and Wood (2006). T. W. Yee (University of Auckland) Modern Regression Basics October Cagliari 144 / 167

112 Generalized Additive Models (GAMs) Generalized Additive Models (GAMs) I Fit an additive model by backfitting. It is iterative procedure that smooths partial residuals to f so p E(y x) = f t (x t ) + f k (x k ) p f t (x t ) = E y k=1, k t k=1, k t f k (x k ) X t. Modified backfitting possible and is implemented. It decomposes η(x) = X β + p r k (x k ) k=1 i.e., into a linear and nonlinear components. T. W. Yee (University of Auckland) Modern Regression Basics October Cagliari 146 / 167

113 Generalized Additive Models (GAMs) Generalized Additive Models (GAMs) I Example 1 Kauri data Y = presence/absence of a tree species, agaaus, which is Agathis australis, better known as Kauri, NZ s most famous tree. Data is from 392 sites from the Hunua forest near Auckland. Figure: Big Kauri tree. T. W. Yee (University of Auckland) Modern Regression Basics October Cagliari 148 / 167

Recap. HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis:

Recap. HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis: 1 / 23 Recap HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis: Pr(G = k X) Pr(X G = k)pr(g = k) Theory: LDA more

More information

Nonparametric Methods

Nonparametric Methods Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis

More information

Nonparametric Regression. Badr Missaoui

Nonparametric Regression. Badr Missaoui Badr Missaoui Outline Kernel and local polynomial regression. Penalized regression. We are given n pairs of observations (X 1, Y 1 ),...,(X n, Y n ) where Y i = r(x i ) + ε i, i = 1,..., n and r(x) = E(Y

More information

Linear, Generalized Linear, and Mixed-Effects Models in R. Linear and Generalized Linear Models in R Topics

Linear, Generalized Linear, and Mixed-Effects Models in R. Linear and Generalized Linear Models in R Topics Linear, Generalized Linear, and Mixed-Effects Models in R John Fox McMaster University ICPSR 2018 John Fox (McMaster University) Statistical Models in R ICPSR 2018 1 / 19 Linear and Generalized Linear

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

Generalized Additive Models

Generalized Additive Models Generalized Additive Models The Model The GLM is: g( µ) = ß 0 + ß 1 x 1 + ß 2 x 2 +... + ß k x k The generalization to the GAM is: g(µ) = ß 0 + f 1 (x 1 ) + f 2 (x 2 ) +... + f k (x k ) where the functions

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown. Weighting We have seen that if E(Y) = Xβ and V (Y) = σ 2 G, where G is known, the model can be rewritten as a linear model. This is known as generalized least squares or, if G is diagonal, with trace(g)

More information

2. (Today) Kernel methods for regression and classification ( 6.1, 6.2, 6.6)

2. (Today) Kernel methods for regression and classification ( 6.1, 6.2, 6.6) 1. Recap linear regression, model selection, coefficient shrinkage ( 3.1, 3.2, 3.3, 3.4.1,2,3) logistic regression, linear discriminant analysis (lda) ( 4.4, 4.3) regression with spline basis (ns, bs),

More information

41903: Introduction to Nonparametrics

41903: Introduction to Nonparametrics 41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will

More information

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Components of a linear model The two

More information

Section 7: Local linear regression (loess) and regression discontinuity designs

Section 7: Local linear regression (loess) and regression discontinuity designs Section 7: Local linear regression (loess) and regression discontinuity designs Yotam Shem-Tov Fall 2015 Yotam Shem-Tov STAT 239/ PS 236A October 26, 2015 1 / 57 Motivation We will focus on local linear

More information

Introduction to Regression

Introduction to Regression Introduction to Regression David E Jones (slides mostly by Chad M Schafer) June 1, 2016 1 / 102 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression Nonparametric Procedures

More information

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science. Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint

More information

Generalized Linear Models

Generalized Linear Models York SPIDA John Fox Notes Generalized Linear Models Copyright 2010 by John Fox Generalized Linear Models 1 1. Topics I The structure of generalized linear models I Poisson and other generalized linear

More information

Introduction to Regression

Introduction to Regression Introduction to Regression p. 1/97 Introduction to Regression Chad Schafer cschafer@stat.cmu.edu Carnegie Mellon University Introduction to Regression p. 1/97 Acknowledgement Larry Wasserman, All of Nonparametric

More information

Generalized Additive Models (GAMs)

Generalized Additive Models (GAMs) Generalized Additive Models (GAMs) Israel Borokini Advanced Analysis Methods in Natural Resources and Environmental Science (NRES 746) October 3, 2016 Outline Quick refresher on linear regression Generalized

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Local regression I. Patrick Breheny. November 1. Kernel weighted averages Local linear regression

Local regression I. Patrick Breheny. November 1. Kernel weighted averages Local linear regression Local regression I Patrick Breheny November 1 Patrick Breheny STA 621: Nonparametric Statistics 1/27 Simple local models Kernel weighted averages The Nadaraya-Watson estimator Expected loss and prediction

More information

Spatial Process Estimates as Smoothers: A Review

Spatial Process Estimates as Smoothers: A Review Spatial Process Estimates as Smoothers: A Review Soutir Bandyopadhyay 1 Basic Model The observational model considered here has the form Y i = f(x i ) + ɛ i, for 1 i n. (1.1) where Y i is the observed

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Econ 582 Nonparametric Regression

Econ 582 Nonparametric Regression Econ 582 Nonparametric Regression Eric Zivot May 28, 2013 Nonparametric Regression Sofarwehaveonlyconsideredlinearregressionmodels = x 0 β + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β The assume

More information

Inversion Base Height. Daggot Pressure Gradient Visibility (miles)

Inversion Base Height. Daggot Pressure Gradient Visibility (miles) Stanford University June 2, 1998 Bayesian Backtting: 1 Bayesian Backtting Trevor Hastie Stanford University Rob Tibshirani University of Toronto Email: trevor@stat.stanford.edu Ftp: stat.stanford.edu:

More information

Time Series and Forecasting Lecture 4 NonLinear Time Series

Time Series and Forecasting Lecture 4 NonLinear Time Series Time Series and Forecasting Lecture 4 NonLinear Time Series Bruce E. Hansen Summer School in Economics and Econometrics University of Crete July 23-27, 2012 Bruce Hansen (University of Wisconsin) Foundations

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

Introduction to Regression

Introduction to Regression Introduction to Regression Chad M. Schafer May 20, 2015 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression Nonparametric Procedures Cross Validation Local Polynomial Regression

More information

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

STA216: Generalized Linear Models. Lecture 1. Review and Introduction STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general

More information

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn A Handbook of Statistical Analyses Using R 2nd Edition Brian S. Everitt and Torsten Hothorn CHAPTER 7 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, Colonic

More information

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Generalized Linear Models. Last time: Background & motivation for moving beyond linear Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered

More information

Penalized Regression

Penalized Regression Penalized Regression Deepayan Sarkar Penalized regression Another potential remedy for collinearity Decreases variability of estimated coefficients at the cost of introducing bias Also known as regularization

More information

Introduction to Nonparametric Regression

Introduction to Nonparametric Regression Introduction to Nonparametric Regression Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 04-Jan-2017 Nathaniel E. Helwig (U of Minnesota)

More information

vgam Family Functions for Log-linear Models

vgam Family Functions for Log-linear Models vgam Family Functions for Log-linear Models T. W. Yee October 30, 2006 Beta Version 0.6-5 Thomas W. Yee Department of Statistics, University of Auckland, New Zealand yee@stat.auckland.ac.nz http://www.stat.auckland.ac.nz/

More information

Regression: Lecture 2

Regression: Lecture 2 Regression: Lecture 2 Niels Richard Hansen April 26, 2012 Contents 1 Linear regression and least squares estimation 1 1.1 Distributional results................................ 3 2 Non-linear effects and

More information

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20 Logistic regression 11 Nov 2010 Logistic regression (EPFL) Applied Statistics 11 Nov 2010 1 / 20 Modeling overview Want to capture important features of the relationship between a (set of) variable(s)

More information

Introduction to Regression

Introduction to Regression Introduction to Regression Chad M. Schafer cschafer@stat.cmu.edu Carnegie Mellon University Introduction to Regression p. 1/100 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression

More information

Introduction. Linear Regression. coefficient estimates for the wage equation: E(Y X) = X 1 β X d β d = X β

Introduction. Linear Regression. coefficient estimates for the wage equation: E(Y X) = X 1 β X d β d = X β Introduction - Introduction -2 Introduction Linear Regression E(Y X) = X β +...+X d β d = X β Example: Wage equation Y = log wages, X = schooling (measured in years), labor market experience (measured

More information

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00 / 5 Administration Homework on web page, due Feb NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00... administration / 5 STA 44/04 Jan 6,

More information

Lecture 5: LDA and Logistic Regression

Lecture 5: LDA and Logistic Regression Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant

More information

Model-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego

Model-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego Model-free prediction intervals for regression and autoregression Dimitris N. Politis University of California, San Diego To explain or to predict? Models are indispensable for exploring/utilizing relationships

More information

A Nonparametric Monotone Regression Method for Bernoulli Responses with Applications to Wafer Acceptance Tests

A Nonparametric Monotone Regression Method for Bernoulli Responses with Applications to Wafer Acceptance Tests A Nonparametric Monotone Regression Method for Bernoulli Responses with Applications to Wafer Acceptance Tests Jyh-Jen Horng Shiau (Joint work with Shuo-Huei Lin and Cheng-Chih Wen) Institute of Statistics

More information

Model checking overview. Checking & Selecting GAMs. Residual checking. Distribution checking

Model checking overview. Checking & Selecting GAMs. Residual checking. Distribution checking Model checking overview Checking & Selecting GAMs Simon Wood Mathematical Sciences, University of Bath, U.K. Since a GAM is just a penalized GLM, residual plots should be checked exactly as for a GLM.

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Lecture 3: Statistical Decision Theory (Part II)

Lecture 3: Statistical Decision Theory (Part II) Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information

Generalized linear models

Generalized linear models Generalized linear models Søren Højsgaard Department of Mathematical Sciences Aalborg University, Denmark October 29, 202 Contents Densities for generalized linear models. Mean and variance...............................

More information

An Introduction to GAMs based on penalized regression splines. Simon Wood Mathematical Sciences, University of Bath, U.K.

An Introduction to GAMs based on penalized regression splines. Simon Wood Mathematical Sciences, University of Bath, U.K. An Introduction to GAMs based on penalied regression splines Simon Wood Mathematical Sciences, University of Bath, U.K. Generalied Additive Models (GAM) A GAM has a form something like: g{e(y i )} = η

More information

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps

More information

Spatially Adaptive Smoothing Splines

Spatially Adaptive Smoothing Splines Spatially Adaptive Smoothing Splines Paul Speckman University of Missouri-Columbia speckman@statmissouriedu September 11, 23 Banff 9/7/3 Ordinary Simple Spline Smoothing Observe y i = f(t i ) + ε i, =

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 9: Basis Expansions Department of Statistics & Biostatistics Rutgers University Nov 01, 2011 Regression and Classification Linear Regression. E(Y X) = f(x) We want to learn

More information

Linear Regression Model. Badr Missaoui

Linear Regression Model. Badr Missaoui Linear Regression Model Badr Missaoui Introduction What is this course about? It is a course on applied statistics. It comprises 2 hours lectures each week and 1 hour lab sessions/tutorials. We will focus

More information

STAT 704 Sections IRLS and Bootstrap

STAT 704 Sections IRLS and Bootstrap STAT 704 Sections 11.4-11.5. IRLS and John Grego Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1 / 14 LOWESS IRLS LOWESS LOWESS (LOcally WEighted Scatterplot Smoothing)

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random STA 216: GENERALIZED LINEAR MODELS Lecture 1. Review and Introduction Much of statistics is based on the assumption that random variables are continuous & normally distributed. Normal linear regression

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"

Kneib, Fahrmeir: Supplement to Structured additive regression for categorical space-time data: A mixed model approach Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach" Sonderforschungsbereich 386, Paper 43 (25) Online unter: http://epub.ub.uni-muenchen.de/

More information

Generalized Linear Models Introduction

Generalized Linear Models Introduction Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,

More information

Generalized additive modelling of hydrological sample extremes

Generalized additive modelling of hydrological sample extremes Generalized additive modelling of hydrological sample extremes Valérie Chavez-Demoulin 1 Joint work with A.C. Davison (EPFL) and Marius Hofert (ETHZ) 1 Faculty of Business and Economics, University of

More information

Estimation of cumulative distribution function with spline functions

Estimation of cumulative distribution function with spline functions INTERNATIONAL JOURNAL OF ECONOMICS AND STATISTICS Volume 5, 017 Estimation of cumulative distribution function with functions Akhlitdin Nizamitdinov, Aladdin Shamilov Abstract The estimation of the cumulative

More information

Generalized Linear Models: An Introduction

Generalized Linear Models: An Introduction Applied Statistics With R Generalized Linear Models: An Introduction John Fox WU Wien May/June 2006 2006 by John Fox Generalized Linear Models: An Introduction 1 A synthesis due to Nelder and Wedderburn,

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.1 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Generalised linear models. Response variable can take a number of different formats

Generalised linear models. Response variable can take a number of different formats Generalised linear models Response variable can take a number of different formats Structure Limitations of linear models and GLM theory GLM for count data GLM for presence \ absence data GLM for proportion

More information

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates 2011-03-16 Contents 1 Generalized Linear Mixed Models Generalized Linear Mixed Models When using linear mixed

More information

Linear Methods for Prediction

Linear Methods for Prediction This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00 Two Hours MATH38052 Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER GENERALISED LINEAR MODELS 26 May 2016 14:00 16:00 Answer ALL TWO questions in Section

More information

Diagnostics and Transformations Part 2

Diagnostics and Transformations Part 2 Diagnostics and Transformations Part 2 Bivariate Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University Multilevel Regression Modeling, 2009 Diagnostics

More information

Variable Selection and Model Choice in Survival Models with Time-Varying Effects

Variable Selection and Model Choice in Survival Models with Time-Varying Effects Variable Selection and Model Choice in Survival Models with Time-Varying Effects Boosting Survival Models Benjamin Hofner 1 Department of Medical Informatics, Biometry and Epidemiology (IMBE) Friedrich-Alexander-Universität

More information

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science 1 Likelihood Ratio Tests that Certain Variance Components Are Zero Ciprian M. Crainiceanu Department of Statistical Science www.people.cornell.edu/pages/cmc59 Work done jointly with David Ruppert, School

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

LWP. Locally Weighted Polynomials toolbox for Matlab/Octave

LWP. Locally Weighted Polynomials toolbox for Matlab/Octave LWP Locally Weighted Polynomials toolbox for Matlab/Octave ver. 2.2 Gints Jekabsons http://www.cs.rtu.lv/jekabsons/ User's manual September, 2016 Copyright 2009-2016 Gints Jekabsons CONTENTS 1. INTRODUCTION...3

More information

Outline. Mixed models in R using the lme4 package Part 5: Generalized linear mixed models. Parts of LMMs carried over to GLMMs

Outline. Mixed models in R using the lme4 package Part 5: Generalized linear mixed models. Parts of LMMs carried over to GLMMs Outline Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates University of Wisconsin - Madison and R Development Core Team UseR!2009,

More information

Consider fitting a model using ordinary least squares (OLS) regression:

Consider fitting a model using ordinary least squares (OLS) regression: Example 1: Mating Success of African Elephants In this study, 41 male African elephants were followed over a period of 8 years. The age of the elephant at the beginning of the study and the number of successful

More information

Mixed models in R using the lme4 package Part 7: Generalized linear mixed models

Mixed models in R using the lme4 package Part 7: Generalized linear mixed models Mixed models in R using the lme4 package Part 7: Generalized linear mixed models Douglas Bates University of Wisconsin - Madison and R Development Core Team University of

More information

Statistical Machine Learning Hilary Term 2018

Statistical Machine Learning Hilary Term 2018 Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html

More information

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Final Review. Yang Feng.   Yang Feng (Columbia University) Final Review 1 / 58 Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple

More information

Introduction to Linear regression analysis. Part 2. Model comparisons

Introduction to Linear regression analysis. Part 2. Model comparisons Introduction to Linear regression analysis Part Model comparisons 1 ANOVA for regression Total variation in Y SS Total = Variation explained by regression with X SS Regression + Residual variation SS Residual

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Generalized Linear Models 1

Generalized Linear Models 1 Generalized Linear Models 1 STA 2101/442: Fall 2012 1 See last slide for copyright information. 1 / 24 Suggested Reading: Davison s Statistical models Exponential families of distributions Sec. 5.2 Chapter

More information

Penalized Splines, Mixed Models, and Recent Large-Sample Results

Penalized Splines, Mixed Models, and Recent Large-Sample Results Penalized Splines, Mixed Models, and Recent Large-Sample Results David Ruppert Operations Research & Information Engineering, Cornell University Feb 4, 2011 Collaborators Matt Wand, University of Wollongong

More information

Generalized Linear Models. Kurt Hornik

Generalized Linear Models. Kurt Hornik Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general

More information

Linear Regression With Special Variables

Linear Regression With Special Variables Linear Regression With Special Variables Junhui Qian December 21, 2014 Outline Standardized Scores Quadratic Terms Interaction Terms Binary Explanatory Variables Binary Choice Models Standardized Scores:

More information

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model

More information

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates Madison January 11, 2011 Contents 1 Definition 1 2 Links 2 3 Example 7 4 Model building 9 5 Conclusions 14

More information

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas 0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity

More information

Generalized Linear Models I

Generalized Linear Models I Statistics 203: Introduction to Regression and Analysis of Variance Generalized Linear Models I Jonathan Taylor - p. 1/16 Today s class Poisson regression. Residuals for diagnostics. Exponential families.

More information

Boosting Methods: Why They Can Be Useful for High-Dimensional Data

Boosting Methods: Why They Can Be Useful for High-Dimensional Data New URL: http://www.r-project.org/conferences/dsc-2003/ Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003) March 20 22, Vienna, Austria ISSN 1609-395X Kurt Hornik,

More information

Time Series. Anthony Davison. c

Time Series. Anthony Davison. c Series Anthony Davison c 2008 http://stat.epfl.ch Periodogram 76 Motivation............................................................ 77 Lutenizing hormone data..................................................

More information

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction

More information

Logistic Regression and Generalized Linear Models

Logistic Regression and Generalized Linear Models Logistic Regression and Generalized Linear Models Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts Sridhar Mahadevan: CMPSCI 689 p. 1/2 Topics Generative vs. Discriminative models In

More information

MATH 644: Regression Analysis Methods

MATH 644: Regression Analysis Methods MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 12: Logistic regression (v1) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 30 Regression methods for binary outcomes 2 / 30 Binary outcomes For the duration of this

More information

Generalized Linear Models in R

Generalized Linear Models in R Generalized Linear Models in R NO ORDER Kenneth K. Lopiano, Garvesh Raskutti, Dan Yang last modified 28 4 2013 1 Outline 1. Background and preliminaries 2. Data manipulation and exercises 3. Data structures

More information