Partial Generalized Additive Models
|
|
- Benedict Alexander
- 5 years ago
- Views:
Transcription
1 Partial Generalized Additive Models An Information-theoretic Approach for Selecting Variables and Avoiding Concurvity Hong Gu 1 Mu Zhu 2 1 Department of Mathematics and Statistics Dalhousie University 2 Department of Statistics and Actuarial Science University of Waterloo March 16, 2009
2 Outline Introduction The concurvity and interpretation of GAM An illustrative example Sequential maximization of mutual information and pgam GAM pgam Partial generalized additive models Simulation and examples A simulation study Ozone data Air pollution and mortality data Summary and discussions
3 Genaralized additive models (GAM) Response varaible: Y, Predictor variables: X = (X 1,..., X p ) GAM: E(Y X) = h(η(x)) = h(f 0 + f 1 (X 1 ) f p (X p )) Response variable Y is from an exponential family distribution and h is a known monotonic link function. GAM is popular due to: simple form and intuitive interpretation of the effect of the individual predictors on the response variable. predictive accuracy
4 Genaralized additive models (GAM) Response varaible: Y, Predictor variables: X = (X 1,..., X p ) GAM: E(Y X) = h(η(x)) = h(f 0 + f 1 (X 1 ) f p (X p )) Response variable Y is from an exponential family distribution and h is a known monotonic link function. GAM is popular due to: simple form and intuitive interpretation of the effect of the individual predictors on the response variable. predictive accuracy
5 Concurvity and interpretation of GAM However, the interpretation is not straightforward: the contributions from different variables are generally not independent. Concurvity: when there are strong functional relationships among predictor variables. (Hastie and Tibshirani, 1990; Donnell, Buja and Stuetzle,1994) the analogue of collinearity.
6 The seminal contribution of Simon Wood Concurvity dealt with by controlling the complexity or smoothness of each fitted function: Shrinkage methods. Wood (2000): a general methodology to efficiently select multiple smoothing parameters. Wood (2004) solved a difficult numeric rank deficiency problem; showed that his methods provided much more stable functional reconstruction and gave very competitive MSE. Wood (2006) gam (mgcv) the current state-of-the-art of GAM fitting. The model interpretation when concurvity structures exist? model simplification and variable selection in GAM?
7 An illustrative example X 1, X 2, X 3, X 4 iid U(0,1), X 5 = 2X1 3 + N(0, ) Y = (5e X 1 + 2X1 3) + X 3 + N(0, ) What is the effect of X 1 on Y? The function f(x) = 5e x + 2x 3 on [0, 1]. 5e ( x) + 2x
8 Illustrative example. Effects estimated by GAM s(x1,3.51) s(x2,1) s(x3,1.48) X X X3 s(x4,1) s(x5,1) X X5
9 Illustrative example. Effects estimated by pgam Only X (1) and X (3) are included in the final model by pgam. s(x1,4) s(x3,2) X X3
10 Mutual information (MI) and its properties MI provides a good measure for the strength of statistical dependency between random variables. MI is defined as MI XY = E { } f(x, Y) log. f X (X)f Y (Y) The following properties make MI a nonlinear analogue of linear correlation ρ (Brillinger, 2004): (1) I XY = 0 iff X is independent of Y. (2) For the continuous case, I XY = if Y = g(x). (3) Invariance, I XY = I UV if u = u(x) and v = v(y) are individually 1-1 measureable transformations. (4) For the bivariate normal, I XY = 1 2 log(1 ρ2 XY ).
11 Mutual information (MI) and its properties MI provides a good measure for the strength of statistical dependency between random variables. MI is defined as MI XY = E { } f(x, Y) log. f X (X)f Y (Y) The following properties make MI a nonlinear analogue of linear correlation ρ (Brillinger, 2004): (1) I XY = 0 iff X is independent of Y. (2) For the continuous case, I XY = if Y = g(x). (3) Invariance, I XY = I UV if u = u(x) and v = v(y) are individually 1-1 measureable transformations. (4) For the bivariate normal, I XY = 1 2 log(1 ρ2 XY ).
12 Mutual information (MI) and its properties MI provides a good measure for the strength of statistical dependency between random variables. MI is defined as MI XY = E { } f(x, Y) log. f X (X)f Y (Y) The following properties make MI a nonlinear analogue of linear correlation ρ (Brillinger, 2004): (1) I XY = 0 iff X is independent of Y. (2) For the continuous case, I XY = if Y = g(x). (3) Invariance, I XY = I UV if u = u(x) and v = v(y) are individually 1-1 measureable transformations. (4) For the bivariate normal, I XY = 1 2 log(1 ρ2 XY ).
13 Mutual information (MI) and its properties MI provides a good measure for the strength of statistical dependency between random variables. MI is defined as MI XY = E { } f(x, Y) log. f X (X)f Y (Y) The following properties make MI a nonlinear analogue of linear correlation ρ (Brillinger, 2004): (1) I XY = 0 iff X is independent of Y. (2) For the continuous case, I XY = if Y = g(x). (3) Invariance, I XY = I UV if u = u(x) and v = v(y) are individually 1-1 measureable transformations. (4) For the bivariate normal, I XY = 1 2 log(1 ρ2 XY ).
14 Mutual information (MI) and its properties MI provides a good measure for the strength of statistical dependency between random variables. MI is defined as MI XY = E { } f(x, Y) log. f X (X)f Y (Y) The following properties make MI a nonlinear analogue of linear correlation ρ (Brillinger, 2004): (1) I XY = 0 iff X is independent of Y. (2) For the continuous case, I XY = if Y = g(x). (3) Invariance, I XY = I UV if u = u(x) and v = v(y) are individually 1-1 measureable transformations. (4) For the bivariate normal, I XY = 1 2 log(1 ρ2 XY ).
15 Mutual information (MI) and its properties MI provides a good measure for the strength of statistical dependency between random variables. MI is defined as MI XY = E { } f(x, Y) log. f X (X)f Y (Y) The following properties make MI a nonlinear analogue of linear correlation ρ (Brillinger, 2004): (1) I XY = 0 iff X is independent of Y. (2) For the continuous case, I XY = if Y = g(x). (3) Invariance, I XY = I UV if u = u(x) and v = v(y) are individually 1-1 measureable transformations. (4) For the bivariate normal, I XY = 1 2 log(1 ρ2 XY ).
16 MI and GAM MI is the amount of information in X that can be used to reduce the uncertainty of Y or the amount of inforamtion in Y that can be used to reduce the uncertainty in X. MI(Y; X 1,, X p ) = H(Y) H(Y X) = H(X) H(X Y) Function approximation is to find η(x) to maximize the MI between Y and η(x) = f(x 1,, X p ). GAM uses the first (or lower) order ANOVA-like decomposition of E(Y X 1,, X p ) = f(x 1,, x p ) to deal with the curse of dimensionality. Note: the maximum value of MI(Y,η(X)) is invariant to choices of the link function.
17 MI and GAM MI is the amount of information in X that can be used to reduce the uncertainty of Y or the amount of inforamtion in Y that can be used to reduce the uncertainty in X. MI(Y; X 1,, X p ) = H(Y) H(Y X) = H(X) H(X Y) Function approximation is to find η(x) to maximize the MI between Y and η(x) = f(x 1,, X p ). GAM uses the first (or lower) order ANOVA-like decomposition of E(Y X 1,, X p ) = f(x 1,, x p ) to deal with the curse of dimensionality. Note: the maximum value of MI(Y,η(X)) is invariant to choices of the link function.
18 MI and GAM MI is the amount of information in X that can be used to reduce the uncertainty of Y or the amount of inforamtion in Y that can be used to reduce the uncertainty in X. MI(Y; X 1,, X p ) = H(Y) H(Y X) = H(X) H(X Y) Function approximation is to find η(x) to maximize the MI between Y and η(x) = f(x 1,, X p ). GAM uses the first (or lower) order ANOVA-like decomposition of E(Y X 1,, X p ) = f(x 1,, x p ) to deal with the curse of dimensionality. Note: the maximum value of MI(Y,η(X)) is invariant to choices of the link function.
19 MI and GAM MI is the amount of information in X that can be used to reduce the uncertainty of Y or the amount of inforamtion in Y that can be used to reduce the uncertainty in X. MI(Y; X 1,, X p ) = H(Y) H(Y X) = H(X) H(X Y) Function approximation is to find η(x) to maximize the MI between Y and η(x) = f(x 1,, X p ). GAM uses the first (or lower) order ANOVA-like decomposition of E(Y X 1,, X p ) = f(x 1,, x p ) to deal with the curse of dimensionality. Note: the maximum value of MI(Y,η(X)) is invariant to choices of the link function.
20 MI and GAM Generally for any η(x), MI(Y; X 1,, X p ) MI (Y;η(X)). If Y X given η(x), then MI(Y; X 1,, X p ) = MI (Y;η(X)). The chain rule for MI: MI(Y; X 1,, X p ) = MI(Y; X 1 ) + MI(Y; X 2 X 1 ) + + MI(Y; X p X p 1,, X 1 ). Finding f1 (X 1 ) to approach MI(Y; X 1 ) max η E(l(Y η(x 1 ))) finding E(Y X 1 ) = f 1 (X 1 ). Denote Y = f1 (X 1 ) + Z 1 Y, where Z 1 Y X 1: MI(Y; X 2 X 1 ) = MI(Z 1 Y, X 2) This leads to the familiar back-fitting algorithm.
21 MI and GAM Generally for any η(x), MI(Y; X 1,, X p ) MI (Y;η(X)). If Y X given η(x), then MI(Y; X 1,, X p ) = MI (Y;η(X)). The chain rule for MI: MI(Y; X 1,, X p ) = MI(Y; X 1 ) + MI(Y; X 2 X 1 ) + + MI(Y; X p X p 1,, X 1 ). Finding f1 (X 1 ) to approach MI(Y; X 1 ) max η E(l(Y η(x 1 ))) finding E(Y X 1 ) = f 1 (X 1 ). Denote Y = f1 (X 1 ) + Z 1 Y, where Z 1 Y X 1: MI(Y; X 2 X 1 ) = MI(Z 1 Y, X 2) This leads to the familiar back-fitting algorithm.
22 MI and GAM Generally for any η(x), MI(Y; X 1,, X p ) MI (Y;η(X)). If Y X given η(x), then MI(Y; X 1,, X p ) = MI (Y;η(X)). The chain rule for MI: MI(Y; X 1,, X p ) = MI(Y; X 1 ) + MI(Y; X 2 X 1 ) + + MI(Y; X p X p 1,, X 1 ). Finding f1 (X 1 ) to approach MI(Y; X 1 ) max η E(l(Y η(x 1 ))) finding E(Y X 1 ) = f 1 (X 1 ). Denote Y = f1 (X 1 ) + Z 1 Y, where Z 1 Y X 1: MI(Y; X 2 X 1 ) = MI(Z 1 Y, X 2) This leads to the familiar back-fitting algorithm.
23 MI and GAM Generally for any η(x), MI(Y; X 1,, X p ) MI (Y;η(X)). If Y X given η(x), then MI(Y; X 1,, X p ) = MI (Y;η(X)). The chain rule for MI: MI(Y; X 1,, X p ) = MI(Y; X 1 ) + MI(Y; X 2 X 1 ) + + MI(Y; X p X p 1,, X 1 ). Finding f1 (X 1 ) to approach MI(Y; X 1 ) max η E(l(Y η(x 1 ))) finding E(Y X 1 ) = f 1 (X 1 ). Denote Y = f1 (X 1 ) + Z 1 Y, where Z 1 Y X 1: MI(Y; X 2 X 1 ) = MI(Z 1 Y, X 2) This leads to the familiar back-fitting algorithm.
24 MI and GAM Generally for any η(x), MI(Y; X 1,, X p ) MI (Y;η(X)). If Y X given η(x), then MI(Y; X 1,, X p ) = MI (Y;η(X)). The chain rule for MI: MI(Y; X 1,, X p ) = MI(Y; X 1 ) + MI(Y; X 2 X 1 ) + + MI(Y; X p X p 1,, X 1 ). Finding f1 (X 1 ) to approach MI(Y; X 1 ) max η E(l(Y η(x 1 ))) finding E(Y X 1 ) = f 1 (X 1 ). Denote Y = f1 (X 1 ) + Z 1 Y, where Z 1 Y X 1: MI(Y; X 2 X 1 ) = MI(Z 1 Y, X 2) This leads to the familiar back-fitting algorithm.
25 MI and GAM Generally for any η(x), MI(Y; X 1,, X p ) MI (Y;η(X)). If Y X given η(x), then MI(Y; X 1,, X p ) = MI (Y;η(X)). The chain rule for MI: MI(Y; X 1,, X p ) = MI(Y; X 1 ) + MI(Y; X 2 X 1 ) + + MI(Y; X p X p 1,, X 1 ). Finding f1 (X 1 ) to approach MI(Y; X 1 ) max η E(l(Y η(x 1 ))) finding E(Y X 1 ) = f 1 (X 1 ). Denote Y = f1 (X 1 ) + Z 1 Y, where Z 1 Y X 1: MI(Y; X 2 X 1 ) = MI(Z 1 Y, X 2) This leads to the familiar back-fitting algorithm.
26 MI and pgam An alternative way to fit the second term, MI(Y; X 2 X 1 ), leads us to a new procedure. Suppose X 2 = g 21 (X 1 ) + X (2) where X 1 X (2) : MI(Y; X 2 X 1 ) = H(X 2 X 1 ) H(X 2 Y, X 1 ) = MI(Y; X (2) ) first, estimate g 21 by smoothing X 2 onto X 1 ; then, fit a (univariate) GAM of Y onto X (2) X 2 g 21 (X 1 ). X (2) X 1. This provides a natural way to avoid concurvity and constitutes the main idea of our procedure, pgam.
27 MI and pgam An alternative way to fit the second term, MI(Y; X 2 X 1 ), leads us to a new procedure. Suppose X 2 = g 21 (X 1 ) + X (2) where X 1 X (2) : MI(Y; X 2 X 1 ) = H(X 2 X 1 ) H(X 2 Y, X 1 ) = MI(Y; X (2) ) first, estimate g 21 by smoothing X 2 onto X 1 ; then, fit a (univariate) GAM of Y onto X (2) X 2 g 21 (X 1 ). X (2) X 1. This provides a natural way to avoid concurvity and constitutes the main idea of our procedure, pgam.
28 MI and pgam An alternative way to fit the second term, MI(Y; X 2 X 1 ), leads us to a new procedure. Suppose X 2 = g 21 (X 1 ) + X (2) where X 1 X (2) : MI(Y; X 2 X 1 ) = H(X 2 X 1 ) H(X 2 Y, X 1 ) = MI(Y; X (2) ) first, estimate g 21 by smoothing X 2 onto X 1 ; then, fit a (univariate) GAM of Y onto X (2) X 2 g 21 (X 1 ). X (2) X 1. This provides a natural way to avoid concurvity and constitutes the main idea of our procedure, pgam.
29 MI and pgam An alternative way to fit the second term, MI(Y; X 2 X 1 ), leads us to a new procedure. Suppose X 2 = g 21 (X 1 ) + X (2) where X 1 X (2) : MI(Y; X 2 X 1 ) = H(X 2 X 1 ) H(X 2 Y, X 1 ) = MI(Y; X (2) ) first, estimate g 21 by smoothing X 2 onto X 1 ; then, fit a (univariate) GAM of Y onto X (2) X 2 g 21 (X 1 ). X (2) X 1. This provides a natural way to avoid concurvity and constitutes the main idea of our procedure, pgam.
30 An explicit variable selection procedure at each step, choose to enter the variable whose MI with Y is the largest. stop when the MI between Y and any of the remaining input variables becomes fairly small. The covariates deemed important a priori are always included in the initial model.
31 Indirect estimation of MI Direct estimation of MI is not a trivial problem. MI(X; Y) = H(X) + H(Y) H(X, Y). Instead, we work with a proxy of MI(Y; X): MI(Y; X) = max MI(Y;η(X)). η if η(x) is sufficient for Y, then MI(Y;η(X)) = MI(Y; X) MI(Y;η(X)) = E(l(Y η(x))) E log f Y (Y). Thus only need conditional log-likelihood to get the right order of the covariates in each step.
32 Indirect estimation of MI Direct estimation of MI is not a trivial problem. MI(X; Y) = H(X) + H(Y) H(X, Y). Instead, we work with a proxy of MI(Y; X): MI(Y; X) = max MI(Y;η(X)). η if η(x) is sufficient for Y, then MI(Y;η(X)) = MI(Y; X) MI(Y;η(X)) = E(l(Y η(x))) E log f Y (Y). Thus only need conditional log-likelihood to get the right order of the covariates in each step.
33 Indirect estimation of MI Direct estimation of MI is not a trivial problem. MI(X; Y) = H(X) + H(Y) H(X, Y). Instead, we work with a proxy of MI(Y; X): MI(Y; X) = max MI(Y;η(X)). η if η(x) is sufficient for Y, then MI(Y;η(X)) = MI(Y; X) MI(Y;η(X)) = E(l(Y η(x))) E log f Y (Y). Thus only need conditional log-likelihood to get the right order of the covariates in each step.
34 The pgam algorithm: Initialization 1. Start with a null model m by fitting a GAM of Y onto a constant; let D 0 be the deviance of m. 2. Center all X j s to have mean zero; let X w = {X (j) = X j ; j = 1, 2,..., p} be the set of working variables. 3. Set t = 1.
35 The pgam algorithm: Initialization 1. Start with a null model m by fitting a GAM of Y onto a constant; let D 0 be the deviance of m. 2. Center all X j s to have mean zero; let X w = {X (j) = X j ; j = 1, 2,..., p} be the set of working variables. 3. Set t = 1.
36 The pgam algorithm: While t p 1. Fit a (univariate) GAM of Y onto every working variable in X w and record the deviance of each resulting GAM. 2. Suppose X (i) X w is the variable whose corresponding (univariate) GAM has the largest log-likelihood or the smallest deviance. Add X (i) into m; record the resulting deviance, D new ; and let X w X w \{X (i) }. 3. Test whether D new is a significant improvement over D 0, e.g., with an F -test for Gaussian or a χ 2 -test for binomial or Poisson. If insignificant, remove X (i) from m and output m. 4. For every X (j) X w (j i), fit the model X (j) = g ji (X (i) ) + ǫ j by smoothing X (j) onto X (i) ; record the fitted functions g ji. 5. Let t t + 1; D 0 D new ; and X (j) X (j) g ji (X (i) ).
37 The pgam algorithm: While t p 1. Fit a (univariate) GAM of Y onto every working variable in X w and record the deviance of each resulting GAM. 2. Suppose X (i) X w is the variable whose corresponding (univariate) GAM has the largest log-likelihood or the smallest deviance. Add X (i) into m; record the resulting deviance, D new ; and let X w X w \{X (i) }. 3. Test whether D new is a significant improvement over D 0, e.g., with an F -test for Gaussian or a χ 2 -test for binomial or Poisson. If insignificant, remove X (i) from m and output m. 4. For every X (j) X w (j i), fit the model X (j) = g ji (X (i) ) + ǫ j by smoothing X (j) onto X (i) ; record the fitted functions g ji. 5. Let t t + 1; D 0 D new ; and X (j) X (j) g ji (X (i) ).
38 The pgam algorithm: While t p 1. Fit a (univariate) GAM of Y onto every working variable in X w and record the deviance of each resulting GAM. 2. Suppose X (i) X w is the variable whose corresponding (univariate) GAM has the largest log-likelihood or the smallest deviance. Add X (i) into m; record the resulting deviance, D new ; and let X w X w \{X (i) }. 3. Test whether D new is a significant improvement over D 0, e.g., with an F -test for Gaussian or a χ 2 -test for binomial or Poisson. If insignificant, remove X (i) from m and output m. 4. For every X (j) X w (j i), fit the model X (j) = g ji (X (i) ) + ǫ j by smoothing X (j) onto X (i) ; record the fitted functions g ji. 5. Let t t + 1; D 0 D new ; and X (j) X (j) g ji (X (i) ).
39 The pgam algorithm: While t p 1. Fit a (univariate) GAM of Y onto every working variable in X w and record the deviance of each resulting GAM. 2. Suppose X (i) X w is the variable whose corresponding (univariate) GAM has the largest log-likelihood or the smallest deviance. Add X (i) into m; record the resulting deviance, D new ; and let X w X w \{X (i) }. 3. Test whether D new is a significant improvement over D 0, e.g., with an F -test for Gaussian or a χ 2 -test for binomial or Poisson. If insignificant, remove X (i) from m and output m. 4. For every X (j) X w (j i), fit the model X (j) = g ji (X (i) ) + ǫ j by smoothing X (j) onto X (i) ; record the fitted functions g ji. 5. Let t t + 1; D 0 D new ; and X (j) X (j) g ji (X (i) ).
40 The pgam algorithm: While t p 1. Fit a (univariate) GAM of Y onto every working variable in X w and record the deviance of each resulting GAM. 2. Suppose X (i) X w is the variable whose corresponding (univariate) GAM has the largest log-likelihood or the smallest deviance. Add X (i) into m; record the resulting deviance, D new ; and let X w X w \{X (i) }. 3. Test whether D new is a significant improvement over D 0, e.g., with an F -test for Gaussian or a χ 2 -test for binomial or Poisson. If insignificant, remove X (i) from m and output m. 4. For every X (j) X w (j i), fit the model X (j) = g ji (X (i) ) + ǫ j by smoothing X (j) onto X (i) ; record the fitted functions g ji. 5. Let t t + 1; D 0 D new ; and X (j) X (j) g ji (X (i) ).
41 The pgam algorithm: Output Output: the model m and the g ji s.
42 Simulation study: Variable selection X 1, X 2, X 3, X 4 iid U(0,1), X 5 = 2X N(0,σ2 1 ) Y = (5e X 1 + 2X 3 1 ) + X 3 + N(0,σ 2 2 ) Number of times different variable combinations are selected by pgam, out of 500 simulations. A star (*) means a variable other than X 1, X 3, or X 5. H = high, M = medium, L = low. Concurvity SNR (1,3) (1,3,5) (1,3,*) (1) (5) (5,3) other Strong H (σ 2 = 0.1) (σ 1 = 0.01) M (σ 2 = 0.5) L (σ 2 = 1.0) Medium H (σ 2 = 0.1) (σ 1 = 0.50) M (σ 2 = 0.5) L (σ 2 = 1.0) Weak H (σ 2 = 0.1) (σ 1 = 0.90) M (σ 2 = 0.5) L (σ 2 = 1.0)
43 Simulation study: Prediction on test sets RMSE and RPSE. DIFF-RMSE = RMSE(GAM) - RMSE(pGAM) positive differences indicate that pgam is (slightly) better. DIFF-RMSE DIFF-RPSE Concurvity SNR mean (stdev) mean (stdev) Strong H (σ 2 = 0.1) (0.0038) (0.0010) (σ 1 = 0.01) M (σ 2 = 0.5) (0.0220) (0.0057) L (σ 2 = 1.0) (0.0800) (0.0183) Medium H (σ 2 = 0.1) (0.0037) (0.0009) (σ 1 = 0.50) M (σ 2 = 0.5) (0.0200) (0.0045) L (σ 2 = 1.0) (0.0421) (0.0091) Weak H (σ 2 = 0.1) (0.0036) (0.0009) (σ 1 = 0.90) M (σ 2 = 0.5) (0.0188) (0.0041) L (σ 2 = 1.0) (0.0416) (0.0088)
44 Simulation study: Estimated functional effects by pgam strong concurvity (σ 1 = 0.01) and medium SNR (σ 2 = 0.5) case. Pointwise mean and CI based on that pgam chooses the right variable combination (489 out of 500 simulations). PGAM: mean and 90% CI for s(x1) PGAM: mean and 90% CI for s(x3) X X3
45 Simulation study: Estimated functional effects by GAM strong concurvity (σ 1 = 0.01) and medium SNR (σ 2 = 0.5). Pointwise mean together with 5th and 95th percentiles (500 simulations). GAM: mean and 90% CI for s(x1) GAM: mean and 90% CI for s(x2) GAM: mean and 90% CI for s(x3) X X X3 GAM: mean and 90% CI for s(x4) GAM: mean and 90% CI for s(x5) X X5
46 Ozone Data: variables selected by pgam Ozone: Gaussian response. Table: Variables in the ozone data set. Name Description ozone logarithm of ozone concentration (log-ppm) temp(1) Sandburg Air Force Base temperature ibh (2) inversion base height dpg (6) Daggert pressure gradient vis (5) visibility in miles vh Vandenburg 500 millibar pressure height humidity (3) humidity (%) ibt inversion base temperature wind wind speed (mph) doy (4) day of the year
47 GAM, pgam and GAM of the same df. as pgam: (1) s(temp,3.79) s(temp,4) s(temp,4) temp temp temp s(ibh,2.75) s(ibh,5) s(ibh,5) ibh ibh ibh s(humidity,2.38) s(humidity,5) s(humidity,5) humidity humidity humidity
48 GAM, pgam and GAM of the same df. as pgam (2) s(doy,4.55) s(doy,4) s(doy,4) doy doy doy s(vis,5.51) s(vis,7) s(vis,7) vis vis vis s(dpg,3.3) s(dpg,3) s(dpg,3) dpg dpg dpg
49 Ozone data: differences of the effects of covariates between GAM and pgam The effect of temp is much closer to a simple linear effect in pgam. For GAM, humidity is not a significant covariate (p-value=0.06 for default GAM and p-value=0.13 for GAM using only six variables). For pgam, after removing the partial effects of temp and ibh the first two variables selected, humidity becomes a significant covariate (p-value = ). Visually, we can see that the effect of humidity is much less flat in pgam. The effects of doy and dpg estimated by pgam peak at different locations than those estimated by GAM.
50 Ozone data: differences of the effects of covariates between GAM and pgam The effect of temp is much closer to a simple linear effect in pgam. For GAM, humidity is not a significant covariate (p-value=0.06 for default GAM and p-value=0.13 for GAM using only six variables). For pgam, after removing the partial effects of temp and ibh the first two variables selected, humidity becomes a significant covariate (p-value = ). Visually, we can see that the effect of humidity is much less flat in pgam. The effects of doy and dpg estimated by pgam peak at different locations than those estimated by GAM.
51 Ozone data: differences of the effects of covariates between GAM and pgam The effect of temp is much closer to a simple linear effect in pgam. For GAM, humidity is not a significant covariate (p-value=0.06 for default GAM and p-value=0.13 for GAM using only six variables). For pgam, after removing the partial effects of temp and ibh the first two variables selected, humidity becomes a significant covariate (p-value = ). Visually, we can see that the effect of humidity is much less flat in pgam. The effects of doy and dpg estimated by pgam peak at different locations than those estimated by GAM.
52 Comparison with the analysis by Donnel et al. (1994) Concurvity was found by Donnel et al., 1994: Given ibh there is a positive relationship between temp and ibt. Given temp, there is a negative relationship between ibh and ibt. The covariates temp and vh tend to increase together. There is a strong and complex (nonlinear) relationship involving ozone, temp, dpg, and doy. pgam has successfully detected and removed the first two concurvities. The third concurvity involves ozone. This suggests temp, dpg, and doy are all important covariates and the pgam model includes all of them.
53 Comparison with the analysis by Donnel et al. (1994) Concurvity was found by Donnel et al., 1994: Given ibh there is a positive relationship between temp and ibt. Given temp, there is a negative relationship between ibh and ibt. The covariates temp and vh tend to increase together. There is a strong and complex (nonlinear) relationship involving ozone, temp, dpg, and doy. pgam has successfully detected and removed the first two concurvities. The third concurvity involves ozone. This suggests temp, dpg, and doy are all important covariates and the pgam model includes all of them.
54 Comparison with the analysis by Donnel et al. (1994) Concurvity was found by Donnel et al., 1994: Given ibh there is a positive relationship between temp and ibt. Given temp, there is a negative relationship between ibh and ibt. The covariates temp and vh tend to increase together. There is a strong and complex (nonlinear) relationship involving ozone, temp, dpg, and doy. pgam has successfully detected and removed the first two concurvities. The third concurvity involves ozone. This suggests temp, dpg, and doy are all important covariates and the pgam model includes all of them.
55 Comparison with the analysis by Donnel et al. (1994) Concurvity was found by Donnel et al., 1994: Given ibh there is a positive relationship between temp and ibt. Given temp, there is a negative relationship between ibh and ibt. The covariates temp and vh tend to increase together. There is a strong and complex (nonlinear) relationship involving ozone, temp, dpg, and doy. pgam has successfully detected and removed the first two concurvities. The third concurvity involves ozone. This suggests temp, dpg, and doy are all important covariates and the pgam model includes all of them.
56 Comparison with the analysis by Donnel et al. (1994) Concurvity was found by Donnel et al., 1994: Given ibh there is a positive relationship between temp and ibt. Given temp, there is a negative relationship between ibh and ibt. The covariates temp and vh tend to increase together. There is a strong and complex (nonlinear) relationship involving ozone, temp, dpg, and doy. pgam has successfully detected and removed the first two concurvities. The third concurvity involves ozone. This suggests temp, dpg, and doy are all important covariates and the pgam model includes all of them.
57 Comparison with Breiman and Friedman (1985) alternating conditional expectations (ACE): ACE includes "temp","ibh","dpg","vis" and "doy". pgam includes all these plus "humidity". Use "doy" as single input, it peaked in late July and early August: the highest pollution days occur from July to September. ACE model: the peak was shifted to the beginning of May. This was puzzling to [them], since the highest pollution days occur from July to September. ACE paper interpretation: "doy" may serve as "a partial surrogate for hours of daylight before and during the morning commuter rush." pgam: the peak effect of "doy" occurs in late July. pgam suggests the shift is due to subtle concurvities.
58 Comparison with Breiman and Friedman (1985) alternating conditional expectations (ACE): ACE includes "temp","ibh","dpg","vis" and "doy". pgam includes all these plus "humidity". Use "doy" as single input, it peaked in late July and early August: the highest pollution days occur from July to September. ACE model: the peak was shifted to the beginning of May. This was puzzling to [them], since the highest pollution days occur from July to September. ACE paper interpretation: "doy" may serve as "a partial surrogate for hours of daylight before and during the morning commuter rush." pgam: the peak effect of "doy" occurs in late July. pgam suggests the shift is due to subtle concurvities.
59 Comparison with Breiman and Friedman (1985) alternating conditional expectations (ACE): ACE includes "temp","ibh","dpg","vis" and "doy". pgam includes all these plus "humidity". Use "doy" as single input, it peaked in late July and early August: the highest pollution days occur from July to September. ACE model: the peak was shifted to the beginning of May. This was puzzling to [them], since the highest pollution days occur from July to September. ACE paper interpretation: "doy" may serve as "a partial surrogate for hours of daylight before and during the morning commuter rush." pgam: the peak effect of "doy" occurs in late July. pgam suggests the shift is due to subtle concurvities.
60 Comparison with Breiman and Friedman (1985) alternating conditional expectations (ACE): ACE includes "temp","ibh","dpg","vis" and "doy". pgam includes all these plus "humidity". Use "doy" as single input, it peaked in late July and early August: the highest pollution days occur from July to September. ACE model: the peak was shifted to the beginning of May. This was puzzling to [them], since the highest pollution days occur from July to September. ACE paper interpretation: "doy" may serve as "a partial surrogate for hours of daylight before and during the morning commuter rush." pgam: the peak effect of "doy" occurs in late July. pgam suggests the shift is due to subtle concurvities.
61 Comparison with Breiman and Friedman (1985) alternating conditional expectations (ACE): ACE includes "temp","ibh","dpg","vis" and "doy". pgam includes all these plus "humidity". Use "doy" as single input, it peaked in late July and early August: the highest pollution days occur from July to September. ACE model: the peak was shifted to the beginning of May. This was puzzling to [them], since the highest pollution days occur from July to September. ACE paper interpretation: "doy" may serve as "a partial surrogate for hours of daylight before and during the morning commuter rush." pgam: the peak effect of "doy" occurs in late July. pgam suggests the shift is due to subtle concurvities.
62 Comparison with Breiman and Friedman (1985) alternating conditional expectations (ACE): ACE includes "temp","ibh","dpg","vis" and "doy". pgam includes all these plus "humidity". Use "doy" as single input, it peaked in late July and early August: the highest pollution days occur from July to September. ACE model: the peak was shifted to the beginning of May. This was puzzling to [them], since the highest pollution days occur from July to September. ACE paper interpretation: "doy" may serve as "a partial surrogate for hours of daylight before and during the morning commuter rush." pgam: the peak effect of "doy" occurs in late July. pgam suggests the shift is due to subtle concurvities.
63 Partial effects among covariates estimated by pgam ibh humidity doy vis temp temp temp temp dpg vh wind ibt temp temp temp temp humidity doy vis dpg ibh ibh ibh ibh vh wind ibt ibh ibh ibh
64 Partial effects among covariates estimated by pgam doy vis dpg vh wind ibt humidity humidity humidity humidity humidity humidity vis dpg vh wind ibt doy doy doy doy doy dpg vh wind ibt vis vis vis vis vh wind ibt dpg dpg dpg
65 Air pollution and mortality data: Philadelphia 1995 to 2000 The National Mortality, Morbidity, and Air Pollution Study (NMMAPS): daily mortality, air pollution, and weather data. http// log(λ t ) = f(t) + d j=1 g j(x jt ) + h(z t ) mortality: Poisson response. Table: Variables in the Philadelphia air pollution data set. Variable Name Description y t mortality number of non-accidental deaths in age group t time measured in days, i.e., 1, 2,..., 2191 z t pollutant daily NO 2 concentration x 1t temp average daily temperature x 2t dptp daily dewpoint temperature
66 Philadelphia air pollution and mortality data set: Approximate significance of smooth terms Model mortality s(time, df = 20) + s(temp, df = 3) + s(dptp, df = 3) + s(pollutant, df = 2). p-value Term GAM pgam g 1 (x 1t ) s(temp, df = 3) g 2 (x 2t ) s(dptp, df = 3) h(z t ) s(pollutant, df = 2) f(t) s(time, df = 20)
67 Philadelphia air pollution and mortality data set: Effects of covariates estimated by GAM. s(temp,3) s(dptp,3) temp dptp s(pollutant,2) s(time,20) pollutant time
68 Philadelphia air pollution and mortality data. Effects of covariates estimated by pgam Only two covariates, t (time) and z t (pollutant), are selected. s(time,20) s(pollutant,2) time pollutant
69 Philadelphia air pollution and mortality data. Partial effects estimated by pgam temp dptp pollutant time time time temp dptp pollutant pollutant
70 Philadelphia air pollution and mortality data: summary Overall, our analysis here suggests: (i) that mortality for Philadelphia residents between the ages of 65 and 75 was decreasing during the period of ; (ii) that mortality for this population was the highest in winter and the lowest in summer; and (iii) that, after adjusting for the strong seasonal effect, air pollution in the form of nitrogen dioxide still appeared to significantly increase mortality for this population.
71 Summary and discussions The back-fitting algorithm for GAM can be conceptually viewed as a sequential method for maximizing MI. pgam gives better estimates of the covariates functional effects when concurvity structures exist. A useful observation about estimating MI: first maximizing the conditional log-likelihood of Y given X and then estimating the entropy of Y alone. Research on fitting GAMs with unknown link functions (e.g., Horowitz, 2001; Cadarso-Suarez, et al., 2005): from the invariance property of MI, it s clear that link function can be freely chosen to facilitate model interpretation without affecting the goodness-of-fit. If the function η is not fully flexible, then the choice of the link function will make some difference. Such as in GLM.
72 Summary and discussions The back-fitting algorithm for GAM can be conceptually viewed as a sequential method for maximizing MI. pgam gives better estimates of the covariates functional effects when concurvity structures exist. A useful observation about estimating MI: first maximizing the conditional log-likelihood of Y given X and then estimating the entropy of Y alone. Research on fitting GAMs with unknown link functions (e.g., Horowitz, 2001; Cadarso-Suarez, et al., 2005): from the invariance property of MI, it s clear that link function can be freely chosen to facilitate model interpretation without affecting the goodness-of-fit. If the function η is not fully flexible, then the choice of the link function will make some difference. Such as in GLM.
73 Summary and discussions The back-fitting algorithm for GAM can be conceptually viewed as a sequential method for maximizing MI. pgam gives better estimates of the covariates functional effects when concurvity structures exist. A useful observation about estimating MI: first maximizing the conditional log-likelihood of Y given X and then estimating the entropy of Y alone. Research on fitting GAMs with unknown link functions (e.g., Horowitz, 2001; Cadarso-Suarez, et al., 2005): from the invariance property of MI, it s clear that link function can be freely chosen to facilitate model interpretation without affecting the goodness-of-fit. If the function η is not fully flexible, then the choice of the link function will make some difference. Such as in GLM.
74 Summary and discussions The back-fitting algorithm for GAM can be conceptually viewed as a sequential method for maximizing MI. pgam gives better estimates of the covariates functional effects when concurvity structures exist. A useful observation about estimating MI: first maximizing the conditional log-likelihood of Y given X and then estimating the entropy of Y alone. Research on fitting GAMs with unknown link functions (e.g., Horowitz, 2001; Cadarso-Suarez, et al., 2005): from the invariance property of MI, it s clear that link function can be freely chosen to facilitate model interpretation without affecting the goodness-of-fit. If the function η is not fully flexible, then the choice of the link function will make some difference. Such as in GLM.
75 Summary and discussions The back-fitting algorithm for GAM can be conceptually viewed as a sequential method for maximizing MI. pgam gives better estimates of the covariates functional effects when concurvity structures exist. A useful observation about estimating MI: first maximizing the conditional log-likelihood of Y given X and then estimating the entropy of Y alone. Research on fitting GAMs with unknown link functions (e.g., Horowitz, 2001; Cadarso-Suarez, et al., 2005): from the invariance property of MI, it s clear that link function can be freely chosen to facilitate model interpretation without affecting the goodness-of-fit. If the function η is not fully flexible, then the choice of the link function will make some difference. Such as in GLM.
Partial Generalized Additive Models
An information theoretical approach to avoid the concurvity Hong Gu 1 Mu Zhu 2 1 Department of Mathematics and Statistics Dalhousie University 2 Department of Statistics and Actuarial Science University
More informationPartial Generalized Additive Models: An Information-Theoretic Approach for Dealing With Concurvity and Selecting Variables
Supplementary materials for this article are available online. PleaseclicktheJCGSlinkathttp://pubs.amstat.org. Partial Generalized Additive Models: An Information-Theoretic Approach for Dealing With Concurvity
More informationON CONCURVITY IN NONLINEAR AND NONPARAMETRIC REGRESSION MODELS
STATISTICA, anno LXXIV, n. 1, 2014 ON CONCURVITY IN NONLINEAR AND NONPARAMETRIC REGRESSION MODELS Sonia Amodio Department of Economics and Statistics, University of Naples Federico II, Via Cinthia 21,
More informationModel checking overview. Checking & Selecting GAMs. Residual checking. Distribution checking
Model checking overview Checking & Selecting GAMs Simon Wood Mathematical Sciences, University of Bath, U.K. Since a GAM is just a penalized GLM, residual plots should be checked exactly as for a GLM.
More informationModelling with smooth functions. Simon Wood University of Bath, EPSRC funded
Modelling with smooth functions Simon Wood University of Bath, EPSRC funded Some data... Daily respiratory deaths, temperature and ozone in Chicago (NMMAPS) 0 50 100 150 200 deaths temperature ozone 2000
More informationLogistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20
Logistic regression 11 Nov 2010 Logistic regression (EPFL) Applied Statistics 11 Nov 2010 1 / 20 Modeling overview Want to capture important features of the relationship between a (set of) variable(s)
More informationGeneralized Additive Models (GAMs)
Generalized Additive Models (GAMs) Israel Borokini Advanced Analysis Methods in Natural Resources and Environmental Science (NRES 746) October 3, 2016 Outline Quick refresher on linear regression Generalized
More informationGENERALIZED ADDITIVE MODELS FOR DATA WITH CONCURVITY: STATISTICAL ISSUES AND A NOVEL MODEL FITTING APPROACH. by Shui He B.S., Fudan University, 1993
GENERALIZED ADDITIVE MODELS FOR DATA WITH CONCURVITY: STATISTICAL ISSUES AND A NOVEL MODEL FITTING APPROACH by Shui He B.S., Fudan University, 1993 Submitted to the Graduate Faculty of the Graduate School
More informationBoosting Methods: Why They Can Be Useful for High-Dimensional Data
New URL: http://www.r-project.org/conferences/dsc-2003/ Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003) March 20 22, Vienna, Austria ISSN 1609-395X Kurt Hornik,
More informationMath 3215 Intro. Probability & Statistics Summer 14. Homework 5: Due 7/3/14
Math 325 Intro. Probability & Statistics Summer Homework 5: Due 7/3/. Let X and Y be continuous random variables with joint/marginal p.d.f. s f(x, y) 2, x y, f (x) 2( x), x, f 2 (y) 2y, y. Find the conditional
More informationSTATISTICAL MODELS FOR QUANTIFYING THE SPATIAL DISTRIBUTION OF SEASONALLY DERIVED OZONE STANDARDS
STATISTICAL MODELS FOR QUANTIFYING THE SPATIAL DISTRIBUTION OF SEASONALLY DERIVED OZONE STANDARDS Eric Gilleland Douglas Nychka Geophysical Statistics Project National Center for Atmospheric Research Supported
More informationInformation geometry for bivariate distribution control
Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic
More informationLecture 21: Convergence of transformations and generating a random variable
Lecture 21: Convergence of transformations and generating a random variable If Z n converges to Z in some sense, we often need to check whether h(z n ) converges to h(z ) in the same sense. Continuous
More informationEnsemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12
Ensemble Methods Charles Sutton Data Mining and Exploration Spring 2012 Bias and Variance Consider a regression problem Y = f(x)+ N(0, 2 ) With an estimate regression function ˆf, e.g., ˆf(x) =w > x Suppose
More informationModelling Survival Data using Generalized Additive Models with Flexible Link
Modelling Survival Data using Generalized Additive Models with Flexible Link Ana L. Papoila 1 and Cristina S. Rocha 2 1 Faculdade de Ciências Médicas, Dep. de Bioestatística e Informática, Universidade
More informationTowards a Regression using Tensors
February 27, 2014 Outline Background 1 Background Linear Regression Tensorial Data Analysis 2 Definition Tensor Operation Tensor Decomposition 3 Model Attention Deficit Hyperactivity Disorder Data Analysis
More informationECE 4400:693 - Information Theory
ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential
More informationCombining Interval and Probabilistic Uncertainty in Engineering Applications
Combining Interval and Probabilistic Uncertainty in Engineering Applications Andrew Pownuk Computational Science Program University of Texas at El Paso El Paso, Texas 79968, USA ampownuk@utep.edu Page
More informationExercises with solutions (Set D)
Exercises with solutions Set D. A fair die is rolled at the same time as a fair coin is tossed. Let A be the number on the upper surface of the die and let B describe the outcome of the coin toss, where
More informationAmerican Journal of EPIDEMIOLOGY
Volume 156 Number 3 August 1, 2002 American Journal of EPIDEMIOLOGY Copyright 2002 by The Johns Hopkins Bloomberg School of Public Health Sponsored by the Society for Epidemiologic Research Published by
More informationSparse Functional Models: Predicting Crop Yields
Sparse Functional Models: Predicting Crop Yields Dustin Lennon Lead Statistician dustin@inferentialist.com Executive Summary Here, we develop a general methodology for extracting optimal functionals mapping
More informationRegression Shrinkage and Selection via the Lasso
Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,
More informationRobustness of Principal Components
PCA for Clustering An objective of principal components analysis is to identify linear combinations of the original variables that are useful in accounting for the variation in those original variables.
More informationAn Introduction to GAMs based on penalized regression splines. Simon Wood Mathematical Sciences, University of Bath, U.K.
An Introduction to GAMs based on penalied regression splines Simon Wood Mathematical Sciences, University of Bath, U.K. Generalied Additive Models (GAM) A GAM has a form something like: g{e(y i )} = η
More informationAn application of the GAM-PCA-VAR model to respiratory disease and air pollution data
An application of the GAM-PCA-VAR model to respiratory disease and air pollution data Márton Ispány 1 Faculty of Informatics, University of Debrecen Hungary Joint work with Juliana Bottoni de Souza, Valdério
More informationLeast Squares Estimation-Finite-Sample Properties
Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions
More informationA short introduction to INLA and R-INLA
A short introduction to INLA and R-INLA Integrated Nested Laplace Approximation Thomas Opitz, BioSP, INRA Avignon Workshop: Theory and practice of INLA and SPDE November 7, 2018 2/21 Plan for this talk
More informationLecture 8: Channel Capacity, Continuous Random Variables
EE376A/STATS376A Information Theory Lecture 8-02/0/208 Lecture 8: Channel Capacity, Continuous Random Variables Lecturer: Tsachy Weissman Scribe: Augustine Chemparathy, Adithya Ganesh, Philip Hwang Channel
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models Generalized Linear Models - part II Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.
More informationDoes Modeling Lead to More Accurate Classification?
Does Modeling Lead to More Accurate Classification? A Comparison of the Efficiency of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang
More informationStatistics for Data Analysis. Niklaus Berger. PSI Practical Course Physics Institute, University of Heidelberg
Statistics for Data Analysis PSI Practical Course 2014 Niklaus Berger Physics Institute, University of Heidelberg Overview You are going to perform a data analysis: Compare measured distributions to theoretical
More informationGeneralized Linear Models (GLZ)
Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the
More informationLinear Regression for Air Pollution Data
UNIVERSITY OF TEXAS AT SAN ANTONIO Linear Regression for Air Pollution Data Liang Jing April 2008 1 1 GOAL The increasing health problems caused by traffic-related air pollution have caught more and more
More informationMultiple Random Variables
Multiple Random Variables Joint Probability Density Let X and Y be two random variables. Their joint distribution function is F ( XY x, y) P X x Y y. F XY ( ) 1, < x
More informationLogistic Regression. Seungjin Choi
Logistic Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationRegularization Paths
December 2005 Trevor Hastie, Stanford Statistics 1 Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Saharon Rosset, Ji Zhu, Hui Zhou, Rob Tibshirani and
More informationInversion Base Height. Daggot Pressure Gradient Visibility (miles)
Stanford University June 2, 1998 Bayesian Backtting: 1 Bayesian Backtting Trevor Hastie Stanford University Rob Tibshirani University of Toronto Email: trevor@stat.stanford.edu Ftp: stat.stanford.edu:
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationUNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Thursday, August 30, 2018
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Thursday, August 30, 2018 Work all problems. 60 points are needed to pass at the Masters Level and 75
More informationRegularization Methods for Additive Models
Regularization Methods for Additive Models Marta Avalos, Yves Grandvalet, and Christophe Ambroise HEUDIASYC Laboratory UMR CNRS 6599 Compiègne University of Technology BP 20529 / 60205 Compiègne, France
More informationGeneralized Linear Models 1
Generalized Linear Models 1 STA 2101/442: Fall 2012 1 See last slide for copyright information. 1 / 24 Suggested Reading: Davison s Statistical models Exponential families of distributions Sec. 5.2 Chapter
More informationON THE FORWARD AND BACKWARD ALGORITHMS OF PROJECTION PURSUIT 1. BY MU ZHU University of Waterloo
The Annals of Statistics 2004, Vol. 32, No. 1, 233 244 Institute of Mathematical Statistics, 2004 ON THE FORWARD AND BACKWARD ALGORITHMS OF PROJECTION PURSUIT 1 BY MU ZHU University of Waterloo This article
More informationHow to deal with non-linear count data? Macro-invertebrates in wetlands
How to deal with non-linear count data? Macro-invertebrates in wetlands In this session we l recognize the advantages of making an effort to better identify the proper error distribution of data and choose
More informationCategorical data analysis Chapter 5
Categorical data analysis Chapter 5 Interpreting parameters in logistic regression The sign of β determines whether π(x) is increasing or decreasing as x increases. The rate of climb or descent increases
More informationGENERALIZED LINEAR MODELING APPROACH TO STOCHASTIC WEATHER GENERATORS
GENERALIZED LINEAR MODELING APPROACH TO STOCHASTIC WEATHER GENERATORS Rick Katz Institute for Study of Society and Environment National Center for Atmospheric Research Boulder, CO USA Joint work with Eva
More informationGeneralized Linear Models: An Introduction
Applied Statistics With R Generalized Linear Models: An Introduction John Fox WU Wien May/June 2006 2006 by John Fox Generalized Linear Models: An Introduction 1 A synthesis due to Nelder and Wedderburn,
More informationSCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models
SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationGeneralized linear models
Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models
More informationP -spline ANOVA-type interaction models for spatio-temporal smoothing
P -spline ANOVA-type interaction models for spatio-temporal smoothing Dae-Jin Lee 1 and María Durbán 1 1 Department of Statistics, Universidad Carlos III de Madrid, SPAIN. e-mail: dae-jin.lee@uc3m.es and
More informationCopula Based Independent Component Analysis
Copula Based Independent Component Analysis CUNY October 2008 Kobi Abayomi + + Asst. Professor Industrial Engineering - Statistics Group Georgia Institute of Technology October 2008 Introduction Outline
More informationHastie, Tibshirani & Friedman: Elements of Statistical Learning Chapter Model Assessment and Selection. CN700/March 4, 2008.
Hastie, Tibshirani & Friedman: Elements of Statistical Learning Chapter 7.1-7.9 Model Assessment and Selection CN700/March 4, 2008 Satyavarta sat@cns.bu.edu Auditory Neuroscience Laboratory, Department
More informationLecture 17: Differential Entropy
Lecture 17: Differential Entropy Differential entropy AEP for differential entropy Quantization Maximum differential entropy Estimation counterpart of Fano s inequality Dr. Yao Xie, ECE587, Information
More informationChapter 12: Bivariate & Conditional Distributions
Chapter 12: Bivariate & Conditional Distributions James B. Ramsey March 2007 James B. Ramsey () Chapter 12 26/07 1 / 26 Introduction Key relationships between joint, conditional, and marginal distributions.
More informationBayesian data analysis in practice: Three simple examples
Bayesian data analysis in practice: Three simple examples Martin P. Tingley Introduction These notes cover three examples I presented at Climatea on 5 October 0. Matlab code is available by request to
More informationGaussian processes. Basic Properties VAG002-
Gaussian processes The class of Gaussian processes is one of the most widely used families of stochastic processes for modeling dependent data observed over time, or space, or time and space. The popularity
More informationInteraction effects for continuous predictors in regression modeling
Interaction effects for continuous predictors in regression modeling Testing for interactions The linear regression model is undoubtedly the most commonly-used statistical model, and has the advantage
More information9 Generalized Linear Models
9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models
More information1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches
Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model
More informationFlexible Spatio-temporal smoothing with array methods
Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session IPS046) p.849 Flexible Spatio-temporal smoothing with array methods Dae-Jin Lee CSIRO, Mathematics, Informatics and
More informationRegression Estimation Least Squares and Maximum Likelihood
Regression Estimation Least Squares and Maximum Likelihood Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 1 Least Squares Max(min)imization Function to minimize
More informationA Study of Relative Efficiency and Robustness of Classification Methods
A Study of Relative Efficiency and Robustness of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang April 28, 2011 Department of Statistics
More informationmgcv: GAMs in R Simon Wood Mathematical Sciences, University of Bath, U.K.
mgcv: GAMs in R Simon Wood Mathematical Sciences, University of Bath, U.K. mgcv, gamm4 mgcv is a package supplied with R for generalized additive modelling, including generalized additive mixed models.
More informationGeneralized Additive Models
Generalized Additive Models The Model The GLM is: g( µ) = ß 0 + ß 1 x 1 + ß 2 x 2 +... + ß k x k The generalization to the GAM is: g(µ) = ß 0 + f 1 (x 1 ) + f 2 (x 2 ) +... + f k (x k ) where the functions
More informationStat 451 Lecture Notes Monte Carlo Integration
Stat 451 Lecture Notes 06 12 Monte Carlo Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 6 in Givens & Hoeting, Chapter 23 in Lange, and Chapters 3 4 in Robert & Casella 2 Updated:
More informationInference in Regression Analysis
Inference in Regression Analysis Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 1 Today: Normal Error Regression Model Y i = β 0 + β 1 X i + ǫ i Y i value
More informationSparse Nonparametric Density Estimation in High Dimensions Using the Rodeo
Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University
More informationSimultaneous Confidence Bands for the Coefficient Function in Functional Regression
University of Haifa From the SelectedWorks of Philip T. Reiss August 7, 2008 Simultaneous Confidence Bands for the Coefficient Function in Functional Regression Philip T. Reiss, New York University Available
More informationRegression Analysis. Regression: Methodology for studying the relationship among two or more variables
Regression Analysis Regression: Methodology for studying the relationship among two or more variables Two major aims: Determine an appropriate model for the relationship between the variables Predict the
More informationThe Poisson transform for unnormalised statistical models. Nicolas Chopin (ENSAE) joint work with Simon Barthelmé (CNRS, Gipsa-LAB)
The Poisson transform for unnormalised statistical models Nicolas Chopin (ENSAE) joint work with Simon Barthelmé (CNRS, Gipsa-LAB) Part I Unnormalised statistical models Unnormalised statistical models
More informationConditional distributions. Conditional expectation and conditional variance with respect to a variable.
Conditional distributions Conditional expectation and conditional variance with respect to a variable Probability Theory and Stochastic Processes, summer semester 07/08 80408 Conditional distributions
More informationEE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page.
EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 28 Please submit on Gradescope. Start every question on a new page.. Maximum Differential Entropy (a) Show that among all distributions supported
More informationReview: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:
Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic
More informationSpacetime models in R-INLA. Elias T. Krainski
Spacetime models in R-INLA Elias T. Krainski 2 Outline Separable space-time models Infant mortality in Paraná PM-10 concentration in Piemonte, Italy 3 Multivariate dynamic regression model y t : n observations
More informationSTAT5044: Regression and Anova
STAT5044: Regression and Anova Inyoung Kim 1 / 18 Outline 1 Logistic regression for Binary data 2 Poisson regression for Count data 2 / 18 GLM Let Y denote a binary response variable. Each observation
More informationMAS223 Statistical Inference and Modelling Exercises
MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,
More informationMultiple Linear Regression
Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from
More informationIntroduction Outline Introduction Copula Specification Heuristic Example Simple Example The Copula perspective Mutual Information as Copula dependent
Copula Based Independent Component Analysis SAMSI 2008 Abayomi, Kobi + + SAMSI 2008 April 2008 Introduction Outline Introduction Copula Specification Heuristic Example Simple Example The Copula perspective
More informationMIT Spring 2016
Generalized Linear Models MIT 18.655 Dr. Kempthorne Spring 2016 1 Outline Generalized Linear Models 1 Generalized Linear Models 2 Generalized Linear Model Data: (y i, x i ), i = 1,..., n where y i : response
More informationLecture 1: August 28
36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random
More informationRegularization Paths. Theme
June 00 Trevor Hastie, Stanford Statistics June 00 Trevor Hastie, Stanford Statistics Theme Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Mee-Young Park,
More informationLecture 3. Linear Regression II Bastian Leibe RWTH Aachen
Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression
More informationSemi-Parametric Importance Sampling for Rare-event probability Estimation
Semi-Parametric Importance Sampling for Rare-event probability Estimation Z. I. Botev and P. L Ecuyer IMACS Seminar 2011 Borovets, Bulgaria Semi-Parametric Importance Sampling for Rare-event probability
More informationRank-Based Methods. Lukas Meier
Rank-Based Methods Lukas Meier 20.01.2014 Introduction Up to now we basically always used a parametric family, like the normal distribution N (µ, σ 2 ) for modeling random data. Based on observed data
More informationof the 7 stations. In case the number of daily ozone maxima in a month is less than 15, the corresponding monthly mean was not computed, being treated
Spatial Trends and Spatial Extremes in South Korean Ozone Seokhoon Yun University of Suwon, Department of Applied Statistics Suwon, Kyonggi-do 445-74 South Korea syun@mail.suwon.ac.kr Richard L. Smith
More informationMaster s Written Examination
Master s Written Examination Option: Statistics and Probability Spring 05 Full points may be obtained for correct answers to eight questions Each numbered question (which may have several parts) is worth
More informationIndependent Component Analysis
1 Independent Component Analysis Background paper: http://www-stat.stanford.edu/ hastie/papers/ica.pdf 2 ICA Problem X = AS where X is a random p-vector representing multivariate input measurements. S
More informationChapter 11 Lecture Outline. Heating the Atmosphere
Chapter 11 Lecture Outline Heating the Atmosphere They are still here! Focus on the Atmosphere Weather Occurs over a short period of time Constantly changing Climate Averaged over a long period of time
More informationCreating Non-Gaussian Processes from Gaussian Processes by the Log-Sum-Exp Approach. Radford M. Neal, 28 February 2005
Creating Non-Gaussian Processes from Gaussian Processes by the Log-Sum-Exp Approach Radford M. Neal, 28 February 2005 A Very Brief Review of Gaussian Processes A Gaussian process is a distribution over
More informationRandom Variables. P(x) = P[X(e)] = P(e). (1)
Random Variables Random variable (discrete or continuous) is used to derive the output statistical properties of a system whose input is a random variable or random in nature. Definition Consider an experiment
More informationST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples
ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will
More informationChapter 4 Multiple Random Variables
Review for the previous lecture Theorems and Examples: How to obtain the pmf (pdf) of U = g ( X Y 1 ) and V = g ( X Y) Chapter 4 Multiple Random Variables Chapter 43 Bivariate Transformations Continuous
More informationA significance test for the lasso
1 First part: Joint work with Richard Lockhart (SFU), Jonathan Taylor (Stanford), and Ryan Tibshirani (Carnegie-Mellon Univ.) Second part: Joint work with Max Grazier G Sell, Stefan Wager and Alexandra
More informationCS145: Probability & Computing Lecture 11: Derived Distributions, Functions of Random Variables
CS145: Probability & Computing Lecture 11: Derived Distributions, Functions of Random Variables Instructor: Erik Sudderth Brown University Computer Science March 5, 2015 Homework Submissions Electronic
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationReduction of Model Complexity and the Treatment of Discrete Inputs in Computer Model Emulation
Reduction of Model Complexity and the Treatment of Discrete Inputs in Computer Model Emulation Curtis B. Storlie a a Los Alamos National Laboratory E-mail:storlie@lanl.gov Outline Reduction of Emulator
More informationA Hybrid ARIMA and Neural Network Model to Forecast Particulate. Matter Concentration in Changsha, China
A Hybrid ARIMA and Neural Network Model to Forecast Particulate Matter Concentration in Changsha, China Guangxing He 1, Qihong Deng 2* 1 School of Energy Science and Engineering, Central South University,
More informationBivariate distributions
Bivariate distributions 3 th October 017 lecture based on Hogg Tanis Zimmerman: Probability and Statistical Inference (9th ed.) Bivariate Distributions of the Discrete Type The Correlation Coefficient
More informationEstimating complex causal effects from incomplete observational data
Estimating complex causal effects from incomplete observational data arxiv:1403.1124v2 [stat.me] 2 Jul 2014 Abstract Juha Karvanen Department of Mathematics and Statistics, University of Jyväskylä, Jyväskylä,
More information