Well-developed and understood properties

Similar documents
3. For a given dataset and linear model, what do you think is true about least squares estimates? Is Ŷ always unique? Yes. Is ˆβ always unique? No.

STAT5044: Regression and Anova. Inyoung Kim

Categorical Predictor Variables

Need for Several Predictor Variables

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

BIOS 2083: Linear Models

Lecture 4 Multiple linear regression

Scatter plot of data from the study. Linear Regression

11 Hypothesis Testing

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Lecture 14 Simple Linear Regression

Scatter plot of data from the study. Linear Regression

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Statistical View of Least Squares

INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y

Regression Analysis Chapter 2 Simple Linear Regression

4 Multiple Linear Regression

Multivariate Regression

14 Multiple Linear Regression

Lecture 19 Multiple (Linear) Regression

Weighted Least Squares

STAT5044: Regression and Anova. Inyoung Kim

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

Simple Linear Regression

Simple Linear Regression

Multivariate Regression (Chapter 10)

Chapter 14 Simple Linear Regression (A)

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Weighted Least Squares

Chapter 5 Matrix Approach to Simple Linear Regression

3 Multiple Linear Regression

Ch 2: Simple Linear Regression

Multiple Linear Regression

Simple linear regression

Statistical View of Least Squares

Machine Learning. Regression Case Part 1. Linear Regression. Machine Learning: Regression Case.

Data Analysis and Statistical Methods Statistics 651

Simple and Multiple Linear Regression

Multiple Linear Regression

CLUSTER EFFECTS AND SIMULTANEITY IN MULTILEVEL MODELS

Linear Regression. 1 Introduction. 2 Least Squares

Basic Distributional Assumptions of the Linear Model: 1. The errors are unbiased: E[ε] = The errors are uncorrelated with common variance:

Ch 3: Multiple Linear Regression

. a m1 a mn. a 1 a 2 a = a n

ECON 3150/4150 spring term 2014: Exercise set for the first seminar and DIY exercises for the first few weeks of the course

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

Reference: Davidson and MacKinnon Ch 2. In particular page

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Simple Linear Regression

Regression diagnostics

ACOVA and Interactions

Ma 3/103: Lecture 24 Linear Regression I: Estimation

TMA4255 Applied Statistics V2016 (5)

BIOSTATISTICS NURS 3324

Lecture 6: Linear models and Gauss-Markov theorem

Measuring the fit of the model - SSR

Lecture 6: Linear Regression

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

MATH 644: Regression Analysis Methods

STAT5044: Regression and Anova

STAT 540: Data Analysis and Regression

Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA

Topic 7 - Matrix Approach to Simple Linear Regression. Outline. Matrix. Matrix. Review of Matrices. Regression model in matrix form

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

STAT 100C: Linear models

x 21 x 22 x 23 f X 1 X 2 X 3 ε

Multiple Regression. Dr. Frank Wood. Frank Wood, Linear Regression Models Lecture 12, Slide 1

Matrix Approach to Simple Linear Regression: An Overview

Chapter 14. Linear least squares

Latent Growth Models 1

4.7 Confidence and Prediction Intervals

STAT 100C: Linear models

Lecture 20: Linear model, the LSE, and UMVUE

Simple Linear Regression Analysis

Bayesian Linear Regression

Statistical Thinking in Biomedical Research Session #3 Statistical Modeling

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Lecture 1 Review: Linear models have the form (in matrix notation) Y = Xβ + ε,

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

Applied Econometrics (QEM)

General Linear Model (Chapter 4)

FIRST MIDTERM EXAM ECON 7801 SPRING 2001

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Business Statistics. Lecture 10: Correlation and Linear Regression

Introduction to Econometrics Midterm Examination Fall 2005 Answer Key

The Gauss-Markov Model. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 61

Chapter 2 Multiple Regression I (Part 1)

ECON 5350 Class Notes Functional Form and Structural Change

Correlation and Regression Theory 1) Multivariate Statistics

Simple Linear Regression

Chaper 5: Matrix Approach to Simple Linear Regression. Matrix: A m by n matrix B is a grid of numbers with m rows and n columns. B = b 11 b m1 ...

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

STAT 350: Geometry of Least Squares

Statistical Techniques II EXST7015 Simple Linear Regression

Analysing data: regression and correlation S6 and S7

36-707: Regression Analysis Homework Solutions. Homework 3

Transcription:

1 INTRODUCTION TO LINEAR MODELS 1 THE CLASSICAL LINEAR MODEL Most commonly used statistical models Flexible models Well-developed and understood properties Ease of interpretation Building block for more general models 1 General Linear Model 2 Generalized Linear Model 3 Generalized Estimating Equations 4 Generalized Linear Mixed Model, etc 5 Heirarchical Generalized Linear Mixed Model, etc EXAMPLES: 1 EXAMPLE 1 Simple linear regression Objective: Relate weight to blood pressure Consider a random sample of n individuals The i-th patient has weight x i and blood pressure Y i (i = 1,2,,n) where Y i is the response variable, x i is a regressor variable, Y i = β 0 + x i + ε i, β 0, are regression coefficients unknown model parameters to be estimated, ε i is an error term

2 1 INTRODUCTION TO LINEAR MODELS x y 0 5 10 15 0 200 400 600 800 Figure 1 y= x + 3*x^2 y= -123 + 49*x 2 EXAMPLE 2 Polynomial regression Objective: Same as EXAMPLE 1 Y i = β 0 + x i + β 2 x 2 i + ε i Is this still a linear model? 3 EXAMPLE 3 Multiple linear regression Objective: Relate blood pressure to weight and age For the i-th patient, x i1 = is weight and x i2 = is age Y i = β 0 + x i1 + β 2 x i2 + β 3 x i3 + ε i where x i3 = x i1 x i2 is the weight-by-age interaction term

1 INTRODUCTION TO LINEAR MODELS 3 4 EXAMPLE 4 Data Transformations Seber & Lee, Example 12 The Law of Gravity The Inverse Square Law states that the force of gravity F between two bodies a distance D apart is given by F = c D β Question: By transforming variables, how can this be viewed as a linear regression model for the paramter β? where Y i = log(f i ), x i = log(d i ), β 0 = log(c), = β, ε i is an error term Y i = β 0 + x i + ε i, Seber states and from experimental data we can estimate β and test whether β = 2

4 1 INTRODUCTION TO LINEAR MODELS MATRIX REPRESENTATION OF LINEAR MODELS The general linear model in matrix form: Y 1 x 10 x 11 x 12 x 1,p 1 Y 2 = x 20 x 21 x 22 x 2,p 1 Y n x n0 x n1 x n2 x n,p 1 Equivalent shorthand form: Y = Xβ + ε β 0 β 2 β p 1 Y (n 1) is the response vector + X (n p) is the design (or model or regression) matrix β (p 1) is the vector of regression coefficients (model parameters) ε (n 1) is the error vector (mean 0) ε 1 ε 2 ε n (a) Example 1 Y 1 Y 2 Y n 1 x 1 1 x 2 = 1 x n ( β0 ) + ε 1 ε 2 ε n

1 INTRODUCTION TO LINEAR MODELS 5 (b) Example 2 Y 1 Y 2 Y n 1 x 1 x 2 1 1 x 2 x 2 2 = 1 x n x 2 n β 0 β 2 + ε 1 ε 2 ε n (c) Example 3 Y 1 1 x 11 x 12 x 13 Y 2 = 1 x 21 x 22 x 23 Y n 1 x n1 x n2 x n3 β 0 β 2 β 3 + ε 1 ε 2 ε n (d) Example 4 log(f 1 ) log(f 2 ) = log(f n ) 1 log(d 1 ) 1 log(d 2 ) 1 log(d n ) ( β0 ) + ε 1 ε 2 ε n

6 1 INTRODUCTION TO LINEAR MODELS (e) EXAMPLE 5 One-way analysis of variance Objective: Compare two treatments for blood pressure Consider random samples of J individuals taking one of two blood pressure medications Y ij is the blood pressure for individual j from treatment group i Y ij = µ + α i + ε ij µ = overall mean blood pressure, α i = effect on blood pressure for treatment i (i = 1,2), ε ij = error term for subject j receiving treatment i Alternate Model Representation: Y ij = µ + α 1 I 1 + α 2 I 2 + ε ij where I 1 is an indicator variable for membership in treatment group 1 and I 2 is an indicator variable for membership in treatment group 2 (f) EXAMPLE 6 Two-way analysis of variance Objective: Same as EXAMPLE 5, but we now also consider a patient s sex Y ijk = µ + α i + β j + ε ijk µ = overall mean blood pressure, α i = effect on blood pressure for treatment i (i = 1,2), β j = effect on blood pressure for sex j (j = 1,2), ε ijk = error term for subject k of sex j receiving tmnt i Alternate Model Representation: Y ij = µ + α 1 I 1 + α 2 I 2 + + J 1 + β 2 J 2 + ε ijk where I 1 is an indicator variable for membership in treatment group 1, I 2 is an indicator variable for membership in treatment group 2, J 1 indicates sex 1, and J 2 indicates sex 2

1 INTRODUCTION TO LINEAR MODELS 7 (g) EXAMPLE 7 Analysis of covariance Objective: Same as EXAMPLE 5, controlling for age Y ij = µ + α i + β(x ij x ) + ε ij µ = overall mean blood pressure, α i = effect on blood pressure for treatment i (i = 1,2), β = slope parameter, x ij = age of subject j receiving treatment i, x = overall mean age, ε ij = error term for subject j receiving treatment i Note: The alternative model representations for these ANOVA and AN- COVA models make it clear that these are linear models Let s continue with matrix representation of these models 5 Example 5 Y 11 Y 1J Y 21 Y 2J = 1 1 0 1 1 0 1 0 1 1 0 1 µ α 1 α 2 + ε 11 ε 1J ε 21 ε 2J

8 1 INTRODUCTION TO LINEAR MODELS 6 Example 6 Y 111 Y 11K Y 121 Y 12K Y 211 Y 21K Y 221 Y 22K = 1 1 0 1 0 1 1 0 1 0 1 1 0 0 1 1 1 0 0 1 1 0 1 1 0 1 0 1 1 0 1 0 1 0 1 1 0 1 0 1 µ α 1 α 2 β 2 + ε 111 ε 11K ε 121 ε 12K ε 211 ε 21K ε 221 ε 22K 7 Example 7 Y 11 Y 1J Y 21 Y 2J = 1 1 0 (x 11 x ) 1 1 0 (x 1J x ) 1 0 1 (x 21 x ) 1 0 1 (x 2J x ) µ α 1 α 2 β + ε 11 ε 1J ε 21 ε 2J

1 INTRODUCTION TO LINEAR MODELS 9 In Summary: Linear models have the form Y = Xβ + ε, where Y n 1 response vector, X n p model matrix, β p 1 vector of unknown regression parameters, ε n 1 mean zero random error vector Notes: 1 Usually x i0 = 1 for all i That is, usually there is an intercept β 0 in the model and the first column of the design matrix X is all 1 s 2 x i0,x i1,,x i,p 1 are called the predictor variables or regressor variables or the covariates They are the data 3 Linear Model means the model is linear in the unknown regression coefficients β 0,,,β p 1 4 Instead of matrices, we can write the model in terms of vectors: p 1 Y = β j x j + ε, j=0 where x j = (x 1j,x 2j,,x nj ) (Also note that we will use the convention in this course that a vector is a column vector) 5 ε is the random part of the model Right now we just assume that E(ε) = 0 Later, we will make more assumptions about the distribution of ε 6 Y is random because ε is random Y inherits randomness from ε 7 We can thus evaluate p 1 E[Y] = E[ j=0 p 1 β j x j + ε] = E[ j=0 p 1 β j x j ] + E[ε] = β j x j 8 The vector E[Y] is a linear combination of the x j 9 E[Y] span(x 0,x 1,,x p 1 ) Ω, j=0

10 1 INTRODUCTION TO LINEAR MODELS FACT: We obtain least squares estimators (LSE s) of the β j, denoted ˆβ j, by projecting Y onto Ω Note that Y is n-dimensional and Ω has dimension p The projection of Y onto Ω is denoted Ŷ In Class Exercise: Model Y i = βx i + ε i (linear regression through the origin) Two datasets: 1 {(x,y) = (2,1),(0,1)} 2 {(x,y) = (1,2),(1/2,2)} For each dataset: 1 Write out the model with vectors and matrices (using the data) 2 Show the vectors y and x on a graph 3 Identify (on your graph) Ω, 4 Plot ŷ = Proj Ω (y), 5 Plot ˆε = y ŷ, identify Ω 6 What is the dimension of Ω? What is the dimension of Ω? 7 Also make a scatterplot of the data and sketch the least squares line 8 Alternatively, if we fit the model Y i = β 0 + x i + ε i, how do your answers to the previous questions change?

1 INTRODUCTION TO LINEAR MODELS 11 2 Dataset 1 2 Dataset 2 dimension 2 1 0 dimension 2 1 0 1 1 0 1 2 dimension 1 Scatterplot 2 1 1 0 1 2 dimension 1 Scatterplot 2 1 1 y y 0 0 1 1 0 1 2 x 1 1 0 1 2 x