Statistics 203: Introduction to Regression and Analysis of Variance Course review

Similar documents
Statistics 203: Introduction to Regression and Analysis of Variance Penalized models

Statistics 262: Intermediate Biostatistics Model selection

Generalized Linear Models I

Regression, Ridge Regression, Lasso

Chapter 1 Statistical Inference

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Generalized Linear Models

Linear Regression Models P8111

High-dimensional regression modeling

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

Math 423/533: The Main Theoretical Topics

STAT 100C: Linear models

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Generalized Linear Models

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

Generalized Linear Models. Kurt Hornik

Generalized linear models

Bayesian linear regression

Hypothesis Testing hypothesis testing approach

STAT5044: Regression and Anova

Generalized Linear Models: An Introduction

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

Review of Statistics 101

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Model Selection. Frank Wood. December 10, 2009

Linear Methods for Prediction

MS&E 226: Small Data

Remedial Measures for Multiple Linear Regression Models

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13)

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

ISyE 691 Data mining and analytics

Experimental Design and Data Analysis for Biologists

Introduction to Statistical modeling: handout for Math 489/583

Stat 5101 Lecture Notes

Psychology 282 Lecture #4 Outline Inferences in SLR

Model comparison and selection

LOGISTIC REGRESSION Joseph M. Hilbe

Introduction to Linear regression analysis. Part 2. Model comparisons

Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52

Multiple Linear Regression for the Supervisor Data

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

Generalized Linear Models (1/29/13)

Data Mining Stat 588

Ch 4. Linear Models for Classification

Outline of GLMs. Definitions

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

Regression diagnostics

Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014

Estimating prediction error in mixed models

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

A Significance Test for the Lasso

Generalized Linear Models

Niche Modeling. STAMPS - MBL Course Woods Hole, MA - August 9, 2016

CS6220: DATA MINING TECHNIQUES

Ch 2: Simple Linear Regression

CS6220: DATA MINING TECHNIQUES

Introductory Econometrics

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Estimating AR/MA models

Master s Written Examination

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

Linear Models for Regression CS534

Inversion Base Height. Daggot Pressure Gradient Visibility (miles)

Linear Models for Regression CS534

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

Dimension Reduction Methods

STAT Financial Time Series

Open Problems in Mixed Models

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences

Central Limit Theorem ( 5.3)

Weighted Least Squares I

Math 5305 Notes. Diagnostics and Remedial Measures. Jesse Crawford. Department of Mathematics Tarleton State University

Chapter 1. Modeling Basics

Multiple Linear Regression

Applied linear statistical models: An overview

Simple Linear Regression

Modeling Real Estate Data using Quantile Regression

p(z)

Formal Statement of Simple Linear Regression Model

Multiple Linear Regression

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

Generalized Additive Models

Transcription:

Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/??

Today Review / overview of what we learned. - p. 2/??

General themes in regression models Specifying regression models. What is the joint (conditional) distribution of all outcomes given all covariates? Are outcomes independent (conditional on covariates)? If not, what is an appropriate model? Fitting the models. Once a model is specified how are we going to estimate the parameters? Is there an algorithm or some existing software to fit the model? Comparing regression models. Inference for coefficients in the model: are some zero (i.e. is a smaller model better?) What if there are two competing models for the data? Why would one be preferable to the other? What if there are many models for the data? How do we compare models for the data? - p. 3/??

Simple linear regression model Only one covariate Y i = β 0 + β 1 X i + ε i, Errors ε are independent N(0, σ 2 ). 1 i n - p. 4/??

Multiple linear regression model Y i = β 0 + β 1 X i1 + + β p 1 X i,p 1 + ε i, 1 i n Errors ε are independent N(0, σ 2 ). β j s: (partial) regression coefficients. Special cases: polynomial / spline regression models where extra columns are functions of one covariate. - p. 5/??

ANOVA & categorical variables Generalization of two-sample tests One-way (fixed) Y ij = µ + α i + ε ij, 1 i r, 1 j n α s are constants to be estimated. Errors ε ij are independent N(0, σ 2 ). Two-way (fixed): Y ijk = µ+α i +β j +(αβ) ij +ε ijk, 1 i r, 1 j m, 1 k n α s, β s, (αβ) s are constants to be estimated. Errors ε ijk are independent N(0, σ 2 ). Experimental design: when balanced layouts are impossible, which is the best design? - p. 6/??

Generalized linear models Non-Gaussian errors. Binary outcomes: logistic regression (or probit). Count outcomes: Poisson regression. A link and variance function determine a GLM. Link: g(e(y i )) = g(µ i ) = η i = x i β = X i0 β 0 + + X i,p 1 β p 1. Variance function: Var(Y i ) = V (µ i ). - p. 7/??

Nonlinear regression models Regression function depends on parameters in a nonlinear fashion. Y i = f(x i1,..., X ip ; θ 1,..., θ q ) + ε i, Errors ε i are independent N(0, σ 2 ). 1 i n - p. 8/??

Robust regression Suppose that we have additive noise, but not Gaussian. Likelihood of the form ( ( Y p 1 j=0 L(β Y, X 0,..., X p 1 ) exp ρ β )) jx j s Leads to robust regression ( n Yi p 1 j=0 ρ X ) ijβ j. s i=1 Can downweight residuals with bigger tails than normal random variables. - p. 9/??

Random & mixed effects ANOVA When the levels of the categorical variables in an ANOVA are a sample from a population, effects should be treated as random. One-way (random): Y ij = µ + α i + ε ij, 1 i r, 1 j n where α i N(0, σ 2 α) are random, independent of the errors ε ij which are independent N(0, σ 2 ). Introduces correlation in the Y s: ( Cov(Y ij, Y i j ) = δ ii σ 2 α + δ jj σ 2). - p. 10/??

Mixed linear models Essentially a model of covariance between observations based on subject effects. General form: Y n 1 = X n p β p 1 + Z n q γ q 1 + ε n 1 where ε N(0, σ 2 I); γ N(0, D) for some covariance D. In this model Y N(Xβ, ZDZ + σ 2 I). Covariance is modelled through random effect design matrix Z and covariance D. - p. 11/??

Time series regression models Another model of covariance between observations, based on dependence in time. Y n 1 = X n p β p 1 + ε n 1 In these models, ε N(0, Σ) where the covariance Σ depends on what kind of time series model is used (i.e. which ARM A(p, q) model?). Example, if ε is AR(1) with parameter ρ then Σ ij = σ 2 ρ i j. - p. 12/??

Functional linear model We talked about a functional two-sample t-test. General form Y i,t = β 0,t + p 1 j=1 X ij β j,t + ε i,t where the noise ε i,t is a random function, independent across observation (curves) Y i,. Parameter estimates are curves: leads to nice inference problems for smooth random curves. - p. 13/??

Least squares Multiple linear regression OLS β = (X t X) 1 X t Y. Non-constant variance but independent WLS β = (X t W X) 1 X t W Y, W i = 1/Var(Y i ) General correlation GLS β = (X t Σ 1 X) 1 X t Σ 1 Y, Cov(Y ) = σ 2 Σ. - p. 14/??

Maximum likelihood In the Gaussian setting, with Σ known least squares is MLE. In other cases, we needed iterative techniques to solve MLE: nonlinear regression: iterative projections onto the tangent space; robust regression: IRLS with weights determined by ψ = ρ generalized linear models: IRLS, Fisher scoring time series regression models: two-stage procedure approximates MLE (can iterate further, though) mixed models: similar techniques (though we skipped the details) - p. 15/??

Diagnostics: influence and outliers Diagnostic plots: Added variable plots. QQplot. Residuals vs. fitted. Standardized residuals vs. fitted. Measures of influence Cook s distance. DF F IT S. DF BET AS. Outlier test with Bonferroni correction. Techniques are most developed for multiple linear regression model, but some can be generalized (using whitened residuals). - p. 16/??

Penalized regression We looked at ridge regression, too. A generic example of the bias-variance tradeoff in statistics. Minimize SSE λ (β) = nx i=1 Y i p 1 X j=1 X ij β j! 2 + λ p 1 X j=1 β 2 j. Other penalties possible: basic idea is that the penalty is a measure of complexity of the model. Smoothing spline: ridge regression for scatterplot smoothers. - p. 17/??

Hypothesis tests: multiple linear regression Multiple linear regression: model R has j less coefficients than model F equivalently there are j linear constraints on β s. SSE(R) SSE(F ) j F = SSE(F ) n p F j,n p (if H 0 is true) Reject H 0 : R is true at level α if F > F 1 α,j,n p. - p. 18/??

Hypothesis tests: general case Other models: DEV (M) = 2 log L(M) replaces SSE(M). Difference D = DEV (R) DEV (F ) χ 2 j (asymptotically). Denominator in the F statistic is usually either known or based on something like Pearson s X 2 : φ = 1 n p n r i (F ) 2. i=1 In general, residuals are whitened r(m) = Σ(M) 1/2 (Y X β(m)). Reject H 0 : R is true at level α if D > χ 2 1 α,n p. - p. 19/??

Model selection: AIC, BIC, stepwise Best subsets regression (leaps): adjusted R 2, C p. Akaike Information Criterion (AIC) AIC(M) = 2 log L(M) + 2 p(m). in model M evaluated at the MLE (Maximum Likelihood Estimators). Schwarz s Bayesian Information Criterion (BIC) BIC(M) = 2 log L(M) + p(m) log n Penalized regression can be thought of as model selection as well: choosing the best penalty parameter on the basis of where GCV (M) = 1 Tr(S(M)) n i=1 Ŷ (M) = S(M)Y. ( ) 2 Y i Ŷi(M) - p. 20/??

That s it! Thanks! - p. 21/??