MATH 829: Introduction to Data Mining and Analysis Linear Regression: statistical tests

Size: px
Start display at page:

Download "MATH 829: Introduction to Data Mining and Analysis Linear Regression: statistical tests"

Transcription

1 1/16 MATH 829: Introduction to Data Mining and Analysis Linear Regression: statistical tests Dominique Guillot Departments of Mathematical Sciences University of Delaware February 17, 2016

2 Statistical hypothesis testing 2/16 Suppose we have a linear model for some data: Y = β 0 + β 1 X β p X p + ɛ.

3 Statistical hypothesis testing 2/16 Suppose we have a linear model for some data: Y = β 0 + β 1 X β p X p + ɛ. An important problem is to identify which variables are really useful in predicting Y.

4 Statistical hypothesis testing 2/16 Suppose we have a linear model for some data: Y = β 0 + β 1 X β p X p + ɛ. An important problem is to identify which variables are really useful in predicting Y. We want to decide if β i = 0 or not with some level of condence. Also want to test if groups of coecients {β ik : k = 1,..., l} are zero.

5 Statistical hypothesis testing (cont.) 3/16 Recall: to do a statistical test: 1 State a null hypothesis H 0 and an alternative hypothesis H 1. 2 Construct an appropriate test statistics. 3 Derive the distribution of the test statistic under the null hypothesis. 4 Select a signicance level α (typically 5% or 1%). 5 Compute the test statistics and decide if the null hypothesis is rejected at the given signicance level.

6 Example 4/16 Example: Suppose X N(µ, 1) with µ unknown. We want to test: H 0 : µ = 0 H 1 : µ 0. We have an iid sample X 1,..., X n N(µ, 1).

7 Example 4/16 Example: Suppose X N(µ, 1) with µ unknown. We want to test: H 0 : µ = 0 H 1 : µ 0. We have an iid sample X 1,..., X n N(µ, 1). Recall: if X N(µ 1, σ1 2) and Y N(µ 2, σ2 2 ) are independent, then X + Y N(µ 1 + µ 2, σ1 2 + σ2 2 ).

8 Example 4/16 Example: Suppose X N(µ, 1) with µ unknown. We want to test: H 0 : µ = 0 H 1 : µ 0. We have an iid sample X 1,..., X n N(µ, 1). Recall: if X N(µ 1, σ1 2) and Y N(µ 2, σ2 2 ) are independent, then X + Y N(µ 1 + µ 2, σ1 2 + σ2 2 ). Therefore, under H 0, we have ˆµ = 1 n X i N(0, 1 n n ). Test statistics: n ˆµ N(0, 1).

9 Example 4/16 Example: Suppose X N(µ, 1) with µ unknown. We want to test: H 0 : µ = 0 H 1 : µ 0. We have an iid sample X 1,..., X n N(µ, 1). Recall: if X N(µ 1, σ1 2) and Y N(µ 2, σ2 2 ) are independent, then X + Y N(µ 1 + µ 2, σ1 2 + σ2 2 ). Therefore, under H 0, we have ˆµ = 1 n X i N(0, 1 n n ). Test statistics: n ˆµ N(0, 1). Suppose we observed: nˆµ = k.

10 Example 4/16 Example: Suppose X N(µ, 1) with µ unknown. We want to test: H 0 : µ = 0 H 1 : µ 0. We have an iid sample X 1,..., X n N(µ, 1). Recall: if X N(µ 1, σ1 2) and Y N(µ 2, σ2 2 ) are independent, then X + Y N(µ 1 + µ 2, σ1 2 + σ2 2 ). Therefore, under H 0, we have ˆµ = 1 n X i N(0, 1 n n ). Test statistics: n ˆµ N(0, 1). Suppose we observed: nˆµ = k. We compute: P ( z α nˆµ z α ) = P ( z α N(0, 1) z α ) = 1 α.

11 Example Example: Suppose X N(µ, 1) with µ unknown. We want to test: H 0 : µ = 0 H 1 : µ 0. We have an iid sample X 1,..., X n N(µ, 1). Recall: if X N(µ 1, σ1 2) and Y N(µ 2, σ2 2 ) are independent, then X + Y N(µ 1 + µ 2, σ1 2 + σ2 2 ). Therefore, under H 0, we have ˆµ = 1 n X i N(0, 1 n n ). Test statistics: n ˆµ N(0, 1). Suppose we observed: nˆµ = k. We compute: P ( z α nˆµ z α ) = P ( z α N(0, 1) z α ) = 1 α. Reject the null hypothesis if k [ z α, z α ]. 4/16

12 Example (cont.) 5/16 For example, suppose: α = 0.05, nˆµ = 2.2. If Z N(0, 1), then P ( 1.96 Z 1.96) = 0.95

13 Example (cont.) 5/16 For example, suppose: α = 0.05, nˆµ = 2.2. If Z N(0, 1), then P ( 1.96 Z 1.96) = 0.95 Therefore, it is very unlikely to observe nˆµ = 2.2 if µ = 0. We reject the null hypothesis.

14 Example (cont.) 5/16 For example, suppose: α = 0.05, nˆµ = 2.2. If Z N(0, 1), then P ( 1.96 Z 1.96) = 0.95 Therefore, it is very unlikely to observe nˆµ = 2.2 if µ = 0. We reject the null hypothesis. In fact, P ( 2.2 Z 2.2) So our p-value is

15 Example (cont.) 5/16 For example, suppose: α = 0.05, nˆµ = 2.2. If Z N(0, 1), then P ( 1.96 Z 1.96) = 0.95 Therefore, it is very unlikely to observe nˆµ = 2.2 if µ = 0. We reject the null hypothesis. In fact, P ( 2.2 Z 2.2) So our p-value is Type I error: H 0 true, but rejected False positive. (Controlled by the level α).

16 Example (cont.) For example, suppose: α = 0.05, nˆµ = 2.2. If Z N(0, 1), then P ( 1.96 Z 1.96) = 0.95 Therefore, it is very unlikely to observe nˆµ = 2.2 if µ = 0. We reject the null hypothesis. In fact, P ( 2.2 Z 2.2) So our p-value is Type I error: H 0 true, but rejected False positive. (Controlled by the level α). Type II error: H 0 false, but not rejected False negative. (Power of the test). 5/16

17 Testing if coecients are zero 6/16 In practice, we often include unnecessary variables in linear models.

18 Testing if coecients are zero 6/16 In practice, we often include unnecessary variables in linear models. These variables bias our estimator, and can lead to poor performance.

19 Testing if coecients are zero 6/16 In practice, we often include unnecessary variables in linear models. These variables bias our estimator, and can lead to poor performance. Need ways of identifying a good set of predictors. We now discuss a classical approach that uses statistical tests.

20 Testing if coecients are zero 6/16 In practice, we often include unnecessary variables in linear models. These variables bias our estimator, and can lead to poor performance. Need ways of identifying a good set of predictors. We now discuss a classical approach that uses statistical tests. Before, we tested if the mean of a N(µ, 1) is zero: H 0 : µ = 0 H 1 : µ 0. assuming σ 2 = 1 is known. What if the variance is unknown? Sample variance: s 2 = 1 n 1 n (x i x) 2, x = 1 n n x i.

21 7/16 In general, suppose X N(µ, σ 2 ) with σ 2 known and we want to test H 0 : µ = µ 0 H 1 : µ µ 0.

22 7/16 In general, suppose X N(µ, σ 2 ) with σ 2 known and we want to test H 0 : µ = µ 0 H 1 : µ µ 0. Under the H 0 hypothesis, we have X := 1 n X i N n ) (µ 0, σ2 n

23 7/16 In general, suppose X N(µ, σ 2 ) with σ 2 known and we want to test H 0 : µ = µ 0 H 1 : µ µ 0. Under the H 0 hypothesis, we have X := 1 n X i N n Therefore, we use the test statistic ) (µ 0, σ2 n Z = X µ 0 σ/ n N(0, 1).

24 7/16 In general, suppose X N(µ, σ 2 ) with σ 2 known and we want to test H 0 : µ = µ 0 H 1 : µ µ 0. Under the H 0 hypothesis, we have X := 1 n X i N n Therefore, we use the test statistic Z = X µ 0 σ/ n ) (µ 0, σ2 n N(0, 1). If the variance is unknown, we replace σ by its sample version s T = X µ 0 s/ n t n 1.

25 Review: the student distribution The student t ν distribution with ν degrees of freedom: f ν (t) = Γ ( ) ν+1 ( ) ν+1 2 ( νπ Γ ν ) 1 + t2 2, ν 2 where Γ is the Gamma function. When X 1,..., X n are iid N(µ, σ 2 ), then X µ s/ n t n 1. 8/16

26 9/16 Back to testing regression coecients: suppose y i = x i,1 β 1 + x i,2 β x i,p β p + ɛ i, where (x ij ) is a xed matrix, and ɛ i are iid N(0, σ 2 ). We saw that this implies ˆβ N(β, (X T X) 1 σ 2 ).

27 9/16 Back to testing regression coecients: suppose y i = x i,1 β 1 + x i,2 β x i,p β p + ɛ i, where (x ij ) is a xed matrix, and ɛ i are iid N(0, σ 2 ). We saw that this implies ˆβ N(β, (X T X) 1 σ 2 ). In particular, ˆβ i N(β i, v i σ 2 ), where v i is the i-th diagonal element of (X T X) 1.

28 9/16 Back to testing regression coecients: suppose y i = x i,1 β 1 + x i,2 β x i,p β p + ɛ i, where (x ij ) is a xed matrix, and ɛ i are iid N(0, σ 2 ). We saw that this implies ˆβ N(β, (X T X) 1 σ 2 ). In particular, ˆβ i N(β i, v i σ 2 ), where v i is the i-th diagonal element of (X T X) 1. We want to test: H 0 : β i = 0 H 1 : β i 0.

29 Back to testing regression coecients: suppose y i = x i,1 β 1 + x i,2 β x i,p β p + ɛ i, where (x ij ) is a xed matrix, and ɛ i are iid N(0, σ 2 ). We saw that this implies ˆβ N(β, (X T X) 1 σ 2 ). In particular, ˆβ i N(β i, v i σ 2 ), where v i is the i-th diagonal element of (X T X) 1. We want to test: H 0 : β i = 0 H 1 : β i 0. Note: v i is known, but σ is unknown. 9/16

30 10/16 Recall: y i = x i,1 β 1 + x i,2 β x i,p β p + ɛ i, Problem: How do we estimate σ?

31 10/16 Recall: y i = x i,1 β 1 + x i,2 β x i,p β p + ɛ i, Problem: How do we estimate σ? Let ˆɛ i = y i (x i,1 ˆβ1 + x i,2 ˆβ2 + + x i,p ˆβp ).

32 10/16 Recall: y i = x i,1 β 1 + x i,2 β x i,p β p + ɛ i, Problem: How do we estimate σ? Let ˆɛ i = y i (x i,1 ˆβ1 + x i,2 ˆβ2 + + x i,p ˆβp ). What about: ˆσ 2 = 1 n ˆɛ 2 i? n What is E(ˆσ 2 )?

33 Recall: y i = x i,1 β 1 + x i,2 β x i,p β p + ɛ i, Problem: How do we estimate σ? Let ˆɛ i = y i (x i,1 ˆβ1 + x i,2 ˆβ2 + + x i,p ˆβp ). What about: ˆσ 2 = 1 n ˆɛ 2 i? n What is E(ˆσ 2 )? We have y = Xβ + ɛ and ˆβ = (X T X) 1 X T y. Thus, ˆɛ = y X ˆβ = y X(X T X) 1 X T y = (I X(X T X) 1 X T )y = (I X(X T X) 1 X T )(Xβ + ɛ) = (I X(X T X) 1 X T )ɛ = Mɛ. 10/16

34 11/16 We showed ˆɛ = Mɛ where M := I X(X T X) 1 X T. Now

35 11/16 We showed ˆɛ = Mɛ where M := I X(X T X) 1 X T. Now n ˆɛ 2 i = ˆɛ T ˆɛ = ɛ T M T Mɛ.

36 11/16 We showed ˆɛ = Mɛ where M := I X(X T X) 1 X T. Now n ˆɛ 2 i = ˆɛ T ˆɛ = ɛ T M T Mɛ. Note: M T = M and M T M = M 2 = (I X(X T X) 1 X T )(I X(X T X) 1 X T ) = I X(X T X) 1 X T = M. (In other words, M is idempotent.)

37 11/16 We showed ˆɛ = Mɛ where M := I X(X T X) 1 X T. Now n ˆɛ 2 i = ˆɛ T ˆɛ = ɛ T M T Mɛ. Note: M T = M and M T M = M 2 = (I X(X T X) 1 X T )(I X(X T X) 1 X T ) = I X(X T X) 1 X T = M. (In other words, M is idempotent.) Therefore, n ˆɛ 2 i = ɛ T Mɛ.

38 We showed ˆɛ = Mɛ where M := I X(X T X) 1 X T. Now n ˆɛ 2 i = ˆɛ T ˆɛ = ɛ T M T Mɛ. Note: M T = M and M T M = M 2 = (I X(X T X) 1 X T )(I X(X T X) 1 X T ) = I X(X T X) 1 X T = M. (In other words, M is idempotent.) Therefore, n ˆɛ 2 i = ɛ T Mɛ. Now, E(ɛ T Mɛ) = E(tr(Mɛɛ T )) = tr E(Mɛɛ T ) = tr ME(ɛɛ T ) = tr Mσ 2 I = σ 2 tr M. 11/16

39 12/16 We proved: E( n ˆɛ 2 i ) = σ 2 tr M, where M = I X(X T X) 1 X T. (Here I = I n, the n n identity matrix.) What is tr M?

40 12/16 We proved: E( n ˆɛ 2 i ) = σ 2 tr M, where M = I X(X T X) 1 X T. (Here I = I n, the n n identity matrix.) What is tr M? Recall tr(ab) = tr(ba). Thus,

41 12/16 We proved: E( n ˆɛ 2 i ) = σ 2 tr M, where M = I X(X T X) 1 X T. (Here I = I n, the n n identity matrix.) What is tr M? Recall tr(ab) = tr(ba). Thus, tr M = tr(i X(X T X) 1 X T ) = n tr(x(x T X) 1 X T ) = n tr(x T X(X T X) 1 ) = n tr(i p ) = n p.

42 We proved: E( n ˆɛ 2 i ) = σ 2 tr M, where M = I X(X T X) 1 X T. (Here I = I n, the n n identity matrix.) What is tr M? Recall tr(ab) = tr(ba). Thus, Therefore, tr M = tr(i X(X T X) 1 X T ) = n tr(x(x T X) 1 X T ) = n tr(x T X(X T X) 1 ) = n tr(i p ) = n p. 1 n n p E( ˆɛ 2 i ) = σ 2. 12/16

43 13/16 As a result of the previous calculation, our estimator of the variance σ 2 in the regression model will be s 2 = 1 n (y i ŷ i ) 2, n p where ŷ i := x i,1 ˆβ1 + x i,2 ˆβ2 + + x i,p ˆβp is our prediction of y i.

44 13/16 As a result of the previous calculation, our estimator of the variance σ 2 in the regression model will be s 2 = 1 n (y i ŷ i ) 2, n p where ŷ i := x i,1 ˆβ1 + x i,2 ˆβ2 + + x i,p ˆβp is our prediction of y i. Our test statistic is T = ˆβ i s, v i = ((X T X) 1 ) ii. v i

45 13/16 As a result of the previous calculation, our estimator of the variance σ 2 in the regression model will be s 2 = 1 n (y i ŷ i ) 2, n p where ŷ i := x i,1 ˆβ1 + x i,2 ˆβ2 + + x i,p ˆβp is our prediction of y i. Our test statistic is T = ˆβ i s, v i = ((X T X) 1 ) ii. v i Under the null hypothesis H 0 : β i = 0, one can show that the above T statistic has a student distribution with n p degrees of freedom: T t n p.

46 13/16 As a result of the previous calculation, our estimator of the variance σ 2 in the regression model will be s 2 = 1 n (y i ŷ i ) 2, n p where ŷ i := x i,1 ˆβ1 + x i,2 ˆβ2 + + x i,p ˆβp is our prediction of y i. Our test statistic is T = ˆβ i s, v i = ((X T X) 1 ) ii. v i Under the null hypothesis H 0 : β i = 0, one can show that the above T statistic has a student distribution with n p degrees of freedom: T t n p. Thus, to test if β i = 0, we compute the value of the T satistic, say T = ˆT and reject the null hypothesis (at the α = 5% level) if P ( t n p ˆT ) 0.05.

47 As a result of the previous calculation, our estimator of the variance σ 2 in the regression model will be s 2 = 1 n (y i ŷ i ) 2, n p where ŷ i := x i,1 ˆβ1 + x i,2 ˆβ2 + + x i,p ˆβp is our prediction of y i. Our test statistic is T = ˆβ i s, v i = ((X T X) 1 ) ii. v i Under the null hypothesis H 0 : β i = 0, one can show that the above T statistic has a student distribution with n p degrees of freedom: T t n p. Thus, to test if β i = 0, we compute the value of the T satistic, say T = ˆT and reject the null hypothesis (at the α = 5% level) if P ( t n p ˆT ) Important: This procedure cannot be iterated to remove multiple coecients. We will see how this is done later. 13/16

48 Condence intervals for the regression coecients 14/16 Recall that ˆβ i N(β i, v i σ 2 ). Using our esimate s 2 for σ 2, we can construct a 1 2α condence interval for β i : ( ˆβi z (1 α) v i s, ˆβ i + z (1 α) v i s). Here z (1 α) is the (1 α)-th percentile of the N(0, 1) distribution, i.e., P (Z z 1 α ) = 1 α.

49 Python 15/16 Unfortunately, scikit-learn doesn't compute t-statistics and condence intervals. However, the module statsmodels provides exactly what we need. import numpy as np import statsmodels.api as sm import statsmodels.formula.api as smf # Load data dat = sm.datasets.get_rdataset("guerry", "HistData").data # Fit regression model (using the natural log # of one of the regressors) results = smf.ols('lottery ~ Literacy + np.log(pop1831)', data=dat).fit() # Inspect the results print results.summary()

50 Python (cont.) 16/16

MATH 829: Introduction to Data Mining and Analysis Consistency of Linear Regression

MATH 829: Introduction to Data Mining and Analysis Consistency of Linear Regression 1/9 MATH 829: Introduction to Data Mining and Analysis Consistency of Linear Regression Dominique Guillot Deartments of Mathematical Sciences University of Delaware February 15, 2016 Distribution of regression

More information

BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1

BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1 BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013)

More information

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method. STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method. Rebecca Barter May 5, 2015 Linear Regression Review Linear Regression Review

More information

Simple and Multiple Linear Regression

Simple and Multiple Linear Regression Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata' Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where

More information

Probability and Statistics Notes

Probability and Statistics Notes Probability and Statistics Notes Chapter Seven Jesse Crawford Department of Mathematics Tarleton State University Spring 2011 (Tarleton State University) Chapter Seven Notes Spring 2011 1 / 42 Outline

More information

18.S096 Problem Set 3 Fall 2013 Regression Analysis Due Date: 10/8/2013

18.S096 Problem Set 3 Fall 2013 Regression Analysis Due Date: 10/8/2013 18.S096 Problem Set 3 Fall 013 Regression Analysis Due Date: 10/8/013 he Projection( Hat ) Matrix and Case Influence/Leverage Recall the setup for a linear regression model y = Xβ + ɛ where y and ɛ are

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses. 1 Review: Let X 1, X,..., X n denote n independent random variables sampled from some distribution might not be normal!) with mean µ) and standard deviation σ). Then X µ σ n In other words, X is approximately

More information

3 Multiple Linear Regression

3 Multiple Linear Regression 3 Multiple Linear Regression 3.1 The Model Essentially, all models are wrong, but some are useful. Quote by George E.P. Box. Models are supposed to be exact descriptions of the population, but that is

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maximum Likelihood Estimation Merlise Clyde STA721 Linear Models Duke University August 31, 2017 Outline Topics Likelihood Function Projections Maximum Likelihood Estimates Readings: Christensen Chapter

More information

Lecture 15. Hypothesis testing in the linear model

Lecture 15. Hypothesis testing in the linear model 14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

MATH 829: Introduction to Data Mining and Analysis Least angle regression

MATH 829: Introduction to Data Mining and Analysis Least angle regression 1/14 MATH 829: Introduction to Data Mining and Analysis Least angle regression Dominique Guillot Departments of Mathematical Sciences University of Delaware February 29, 2016 Least angle regression (LARS)

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

MATH 829: Introduction to Data Mining and Analysis Linear Regression: old and new

MATH 829: Introduction to Data Mining and Analysis Linear Regression: old and new 1/15 MATH 829: Introduction to Data Mining and Analysis Linear Regression: old and new Dominique Guillot Departments of Mathematical Sciences University of Delaware February 10, 2016 Linear Regression:

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

1 Least Squares Estimation - multiple regression.

1 Least Squares Estimation - multiple regression. Introduction to multiple regression. Fall 2010 1 Least Squares Estimation - multiple regression. Let y = {y 1,, y n } be a n 1 vector of dependent variable observations. Let β = {β 0, β 1 } be the 2 1

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

STA442/2101: Assignment 5

STA442/2101: Assignment 5 STA442/2101: Assignment 5 Craig Burkett Quiz on: Oct 23 rd, 2015 The questions are practice for the quiz next week, and are not to be handed in. I would like you to bring in all of the code you used to

More information

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik MAT2377 Rafa l Kulik Version 2015/November/26 Rafa l Kulik Bivariate data and scatterplot Data: Hydrocarbon level (x) and Oxygen level (y): x: 0.99, 1.02, 1.15, 1.29, 1.46, 1.36, 0.87, 1.23, 1.55, 1.40,

More information

variability of the model, represented by σ 2 and not accounted for by Xβ

variability of the model, represented by σ 2 and not accounted for by Xβ Posterior Predictive Distribution Suppose we have observed a new set of explanatory variables X and we want to predict the outcomes ỹ using the regression model. Components of uncertainty in p(ỹ y) variability

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

INTERVAL ESTIMATION AND HYPOTHESES TESTING

INTERVAL ESTIMATION AND HYPOTHESES TESTING INTERVAL ESTIMATION AND HYPOTHESES TESTING 1. IDEA An interval rather than a point estimate is often of interest. Confidence intervals are thus important in empirical work. To construct interval estimates,

More information

3. For a given dataset and linear model, what do you think is true about least squares estimates? Is Ŷ always unique? Yes. Is ˆβ always unique? No.

3. For a given dataset and linear model, what do you think is true about least squares estimates? Is Ŷ always unique? Yes. Is ˆβ always unique? No. 7. LEAST SQUARES ESTIMATION 1 EXERCISE: Least-Squares Estimation and Uniqueness of Estimates 1. For n real numbers a 1,...,a n, what value of a minimizes the sum of squared distances from a to each of

More information

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77 Linear Regression Chapter 3 September 27, 2016 Chapter 3 September 27, 2016 1 / 77 1 3.1. Simple linear regression 2 3.2 Multiple linear regression 3 3.3. The least squares estimation 4 3.4. The statistical

More information

Regression and Statistical Inference

Regression and Statistical Inference Regression and Statistical Inference Walid Mnif wmnif@uwo.ca Department of Applied Mathematics The University of Western Ontario, London, Canada 1 Elements of Probability 2 Elements of Probability CDF&PDF

More information

Properties of the least squares estimates

Properties of the least squares estimates Properties of the least squares estimates 2019-01-18 Warmup Let a and b be scalar constants, and X be a scalar random variable. Fill in the blanks E ax + b) = Var ax + b) = Goal Recall that the least squares

More information

20.1. Balanced One-Way Classification Cell means parametrization: ε 1. ε I. + ˆɛ 2 ij =

20.1. Balanced One-Way Classification Cell means parametrization: ε 1. ε I. + ˆɛ 2 ij = 20. ONE-WAY ANALYSIS OF VARIANCE 1 20.1. Balanced One-Way Classification Cell means parametrization: Y ij = µ i + ε ij, i = 1,..., I; j = 1,..., J, ε ij N(0, σ 2 ), In matrix form, Y = Xβ + ε, or 1 Y J

More information

Statistical Inference

Statistical Inference Statistical Inference Bernhard Klingenberg Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Outline Estimation: Review of concepts

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

Matrix Approach to Simple Linear Regression: An Overview

Matrix Approach to Simple Linear Regression: An Overview Matrix Approach to Simple Linear Regression: An Overview Aspects of matrices that you should know: Definition of a matrix Addition/subtraction/multiplication of matrices Symmetric/diagonal/identity matrix

More information

MATH 644: Regression Analysis Methods

MATH 644: Regression Analysis Methods MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Bias Variance Trade-off

Bias Variance Trade-off Bias Variance Trade-off The mean squared error of an estimator MSE(ˆθ) = E([ˆθ θ] 2 ) Can be re-expressed MSE(ˆθ) = Var(ˆθ) + (B(ˆθ) 2 ) MSE = VAR + BIAS 2 Proof MSE(ˆθ) = E((ˆθ θ) 2 ) = E(([ˆθ E(ˆθ)]

More information

MATH 567: Mathematical Techniques in Data Science Linear Regression: old and new

MATH 567: Mathematical Techniques in Data Science Linear Regression: old and new 1/14 MATH 567: Mathematical Techniques in Data Science Linear Regression: old and new Dominique Guillot Departments of Mathematical Sciences University of Delaware February 13, 2017 Linear Regression:

More information

4 Multiple Linear Regression

4 Multiple Linear Regression 4 Multiple Linear Regression 4. The Model Definition 4.. random variable Y fits a Multiple Linear Regression Model, iff there exist β, β,..., β k R so that for all (x, x 2,..., x k ) R k where ε N (, σ

More information

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Lecture 2. The Simple Linear Regression Model: Matrix Approach Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution

More information

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima Applied Statistics Lecturer: Serena Arima Hypothesis testing for the linear model Under the Gauss-Markov assumptions and the normality of the error terms, we saw that β N(β, σ 2 (X X ) 1 ) and hence s

More information

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:. MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss

More information

STA 2101/442 Assignment Four 1

STA 2101/442 Assignment Four 1 STA 2101/442 Assignment Four 1 One version of the general linear model with fixed effects is y = Xβ + ɛ, where X is an n p matrix of known constants with n > p and the columns of X linearly independent.

More information

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear

More information

STAT5044: Regression and Anova. Inyoung Kim

STAT5044: Regression and Anova. Inyoung Kim STAT5044: Regression and Anova Inyoung Kim 2 / 51 Outline 1 Matrix Expression 2 Linear and quadratic forms 3 Properties of quadratic form 4 Properties of estimates 5 Distributional properties 3 / 51 Matrix

More information

Chapter 12: Multiple Linear Regression

Chapter 12: Multiple Linear Regression Chapter 12: Multiple Linear Regression Seungchul Baek Department of Statistics, University of South Carolina STAT 509: Statistics for Engineers 1 / 55 Introduction A regression model can be expressed as

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

Regression #3: Properties of OLS Estimator

Regression #3: Properties of OLS Estimator Regression #3: Properties of OLS Estimator Econ 671 Purdue University Justin L. Tobias (Purdue) Regression #3 1 / 20 Introduction In this lecture, we establish some desirable properties associated with

More information

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75

More information

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n

More information

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model Outline 1 Multiple Linear Regression (Estimation, Inference, Diagnostics and Remedial Measures) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression

More information

11 Hypothesis Testing

11 Hypothesis Testing 28 11 Hypothesis Testing 111 Introduction Suppose we want to test the hypothesis: H : A q p β p 1 q 1 In terms of the rows of A this can be written as a 1 a q β, ie a i β for each row of A (here a i denotes

More information

MATH 829: Introduction to Data Mining and Analysis Graphical Models I

MATH 829: Introduction to Data Mining and Analysis Graphical Models I MATH 829: Introduction to Data Mining and Analysis Graphical Models I Dominique Guillot Departments of Mathematical Sciences University of Delaware May 2, 2016 1/12 Independence and conditional independence:

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 370 Regression models are used to study the relationship of a response variable and one or more predictors. The response is also called the dependent variable, and the predictors

More information

MATH 829: Introduction to Data Mining and Analysis Graphical Models III - Gaussian Graphical Models (cont.)

MATH 829: Introduction to Data Mining and Analysis Graphical Models III - Gaussian Graphical Models (cont.) 1/12 MATH 829: Introduction to Data Mining and Analysis Graphical Models III - Gaussian Graphical Models (cont.) Dominique Guillot Departments of Mathematical Sciences University of Delaware May 6, 2016

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015 Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

REGRESSION WITH SPATIALLY MISALIGNED DATA. Lisa Madsen Oregon State University David Ruppert Cornell University

REGRESSION WITH SPATIALLY MISALIGNED DATA. Lisa Madsen Oregon State University David Ruppert Cornell University REGRESSION ITH SPATIALL MISALIGNED DATA Lisa Madsen Oregon State University David Ruppert Cornell University SPATIALL MISALIGNED DATA 10 X X X X X X X X 5 X X X X X 0 X 0 5 10 OUTLINE 1. Introduction 2.

More information

Multivariate Linear Regression Models

Multivariate Linear Regression Models Multivariate Linear Regression Models Regression analysis is used to predict the value of one or more responses from a set of predictors. It can also be used to estimate the linear association between

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A = Matrices and vectors A matrix is a rectangular array of numbers Here s an example: 23 14 17 A = 225 0 2 This matrix has dimensions 2 3 The number of rows is first, then the number of columns We can write

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation Using the least squares estimator for β we can obtain predicted values and compute residuals: Ŷ = Z ˆβ = Z(Z Z) 1 Z Y ˆɛ = Y Ŷ = Y Z(Z Z) 1 Z Y = [I Z(Z Z) 1 Z ]Y. The usual decomposition

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/

More information

Hypothesis Testing One Sample Tests

Hypothesis Testing One Sample Tests STATISTICS Lecture no. 13 Department of Econometrics FEM UO Brno office 69a, tel. 973 442029 email:jiri.neubauer@unob.cz 12. 1. 2010 Tests on Mean of a Normal distribution Tests on Variance of a Normal

More information

Chapter 12. Analysis of variance

Chapter 12. Analysis of variance Serik Sagitov, Chalmers and GU, January 9, 016 Chapter 1. Analysis of variance Chapter 11: I = samples independent samples paired samples Chapter 1: I 3 samples of equal size J one-way layout two-way layout

More information

Chapter 1. Linear Regression with One Predictor Variable

Chapter 1. Linear Regression with One Predictor Variable Chapter 1. Linear Regression with One Predictor Variable 1.1 Statistical Relation Between Two Variables To motivate statistical relationships, let us consider a mathematical relation between two mathematical

More information

Need for Several Predictor Variables

Need for Several Predictor Variables Multiple regression One of the most widely used tools in statistical analysis Matrix expressions for multiple regression are the same as for simple linear regression Need for Several Predictor Variables

More information

Chapter 14. Linear least squares

Chapter 14. Linear least squares Serik Sagitov, Chalmers and GU, March 5, 2018 Chapter 14 Linear least squares 1 Simple linear regression model A linear model for the random response Y = Y (x) to an independent variable X = x For a given

More information

Managing Uncertainty

Managing Uncertainty Managing Uncertainty Bayesian Linear Regression and Kalman Filter December 4, 2017 Objectives The goal of this lab is multiple: 1. First it is a reminder of some central elementary notions of Bayesian

More information

ST430 Exam 1 with Answers

ST430 Exam 1 with Answers ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.

More information

High-dimensional regression

High-dimensional regression High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and

More information

Lecture 11: Regression Methods I (Linear Regression)

Lecture 11: Regression Methods I (Linear Regression) Lecture 11: Regression Methods I (Linear Regression) Fall, 2017 1 / 40 Outline Linear Model Introduction 1 Regression: Supervised Learning with Continuous Responses 2 Linear Models and Multiple Linear

More information

The outline for Unit 3

The outline for Unit 3 The outline for Unit 3 Unit 1. Introduction: The regression model. Unit 2. Estimation principles. Unit 3: Hypothesis testing principles. 3.1 Wald test. 3.2 Lagrange Multiplier. 3.3 Likelihood Ratio Test.

More information

Hypothesis Testing The basic ingredients of a hypothesis test are

Hypothesis Testing The basic ingredients of a hypothesis test are Hypothesis Testing The basic ingredients of a hypothesis test are 1 the null hypothesis, denoted as H o 2 the alternative hypothesis, denoted as H a 3 the test statistic 4 the data 5 the conclusion. The

More information

14 Multiple Linear Regression

14 Multiple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in

More information

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1) Summary of Chapter 7 (Sections 7.2-7.5) and Chapter 8 (Section 8.1) Chapter 7. Tests of Statistical Hypotheses 7.2. Tests about One Mean (1) Test about One Mean Case 1: σ is known. Assume that X N(µ, σ

More information

M(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1

M(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1 Math 66/566 - Midterm Solutions NOTE: These solutions are for both the 66 and 566 exam. The problems are the same until questions and 5. 1. The moment generating function of a random variable X is M(t)

More information

MLES & Multivariate Normal Theory

MLES & Multivariate Normal Theory Merlise Clyde September 6, 2016 Outline Expectations of Quadratic Forms Distribution Linear Transformations Distribution of estimates under normality Properties of MLE s Recap Ŷ = ˆµ is an unbiased estimate

More information

Chapter 2 Multiple Regression I (Part 1)

Chapter 2 Multiple Regression I (Part 1) Chapter 2 Multiple Regression I (Part 1) 1 Regression several predictor variables The response Y depends on several predictor variables X 1,, X p response {}}{ Y predictor variables {}}{ X 1, X 2,, X p

More information

Applied Regression Analysis

Applied Regression Analysis Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of

More information

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing STAT763: Applied Regression Analysis Multiple linear regression 4.4 Hypothesis testing Chunsheng Ma E-mail: cma@math.wichita.edu 4.4.1 Significance of regression Null hypothesis (Test whether all β j =

More information

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u Interval estimation and hypothesis tests So far our focus has been on estimation of the parameter vector β in the linear model y i = β 1 x 1i + β 2 x 2i +... + β K x Ki + u i = x iβ + u i for i = 1, 2,...,

More information

ST 740: Linear Models and Multivariate Normal Inference

ST 740: Linear Models and Multivariate Normal Inference ST 740: Linear Models and Multivariate Normal Inference Alyson Wilson Department of Statistics North Carolina State University November 4, 2013 A. Wilson (NCSU STAT) Linear Models November 4, 2013 1 /

More information

Problem Selected Scores

Problem Selected Scores Statistics Ph.D. Qualifying Exam: Part II November 20, 2010 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. Problem 1 2 3 4 5 6 7 8 9 10 11 12 Selected

More information

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation. Statistical Computation Math 475 Jimin Ding Department of Mathematics Washington University in St. Louis www.math.wustl.edu/ jmding/math475/index.html October 10, 2013 Ridge Part IV October 10, 2013 1

More information

Lecture 14 Simple Linear Regression

Lecture 14 Simple Linear Regression Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).

More information

Econometrics. Week 11. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics. Week 11. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Econometrics Week 11 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 30 Recommended Reading For the today Advanced Time Series Topics Selected topics

More information

Math 5305 Notes. Diagnostics and Remedial Measures. Jesse Crawford. Department of Mathematics Tarleton State University

Math 5305 Notes. Diagnostics and Remedial Measures. Jesse Crawford. Department of Mathematics Tarleton State University Math 5305 Notes Diagnostics and Remedial Measures Jesse Crawford Department of Mathematics Tarleton State University (Tarleton State University) Diagnostics and Remedial Measures 1 / 44 Model Assumptions

More information

ANALYSIS OF VARIANCE AND QUADRATIC FORMS

ANALYSIS OF VARIANCE AND QUADRATIC FORMS 4 ANALYSIS OF VARIANCE AND QUADRATIC FORMS The previous chapter developed the regression results involving linear functions of the dependent variable, β, Ŷ, and e. All were shown to be normally distributed

More information

Lecture 10 Multiple Linear Regression

Lecture 10 Multiple Linear Regression Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

Lecture 4 Multiple linear regression

Lecture 4 Multiple linear regression Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters

More information

The linear model is the most fundamental of all serious statistical models encompassing:

The linear model is the most fundamental of all serious statistical models encompassing: Linear Regression Models: A Bayesian perspective Ingredients of a linear model include an n 1 response vector y = (y 1,..., y n ) T and an n p design matrix (e.g. including regressors) X = [x 1,..., x

More information

Answers to Problem Set #4

Answers to Problem Set #4 Answers to Problem Set #4 Problems. Suppose that, from a sample of 63 observations, the least squares estimates and the corresponding estimated variance covariance matrix are given by: bβ bβ 2 bβ 3 = 2

More information

The assumptions are needed to give us... valid standard errors valid confidence intervals valid hypothesis tests and p-values

The assumptions are needed to give us... valid standard errors valid confidence intervals valid hypothesis tests and p-values Statistical Consulting Topics The Bootstrap... The bootstrap is a computer-based method for assigning measures of accuracy to statistical estimates. (Efron and Tibshrani, 1998.) What do we do when our

More information