Math 423/533: The Main Theoretical Topics

Size: px
Start display at page:

Download "Math 423/533: The Main Theoretical Topics"

Transcription

1 Math 423/533: The Main Theoretical Topics

2 Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p) row vector X - (n p) matrix containing all predictors for all individuals i = 1,..., n. y = (y 1,..., y n ) (n 1) column vector Y i and Y: random variables corresponding to responses 1

3 Linear model assumptions For i = 1,..., n, and where E Yi x i [Y i x i ] = x i β = β j x ij p j=1 Var Yi x i [Y i x i ] = σ 2 β = (β 1,..., β p ) is the (p 1) vector of regression coefficients, and σ 2 > 0 is the error variance. We assume also that Y 1,..., Y n are independent given x 1,..., x n. 2

4 Linear model assumptions In vector form E Y X [Y X] = Xβ (n 1) and Var Y X [Y X] = σ 2 I n (n n). This is equivalent to a model specification of Y = Xβ + ɛ where E ɛ X [ɛ X] = 0 n Var ɛ X [ɛ X] = σ 2 I n. 3

5 The intercept We usually consider including the special predictor x i0 1 i = 1,..., n and specify the model E Yi x i [Y i x i ] = x i β = β 0 + β j x ij = β j x ij This model has p + 1 β parameters. We will let p count the total number of predictors, including the intercept term. p j=1 p j=0 4

6 Simple linear regression We specify [ E β0 Yi x i [Y i x i ] = β 0 + β 1 x i1 = [1 x i1 ] β 1 with p = 2 parameters in the regression model. ] = x i β This model posits a straight line relationship between x and y. 5

7 Least squares estimation in simple linear regression On the basis of data (x i1, y i ), i = 1,..., n, we choose the line of best fit according to the least squares principle. We estimate parameters β = (β 0, β 1 ) by β where β = arg min β (y i β 0 β 1 x i1 ) 2 = arg min β where we may also write, in vector form, S(β) = (y Xβ) (y Xβ) S(β) We achieve the minimization by calculus. 6

8 The Normal Equations We solve S(β) β 0 = 2 S(β) β 1 = 2 (y i β 0 β 1 x i1 ) = 0 x i1 (y i β 0 β 1 x i1 ) = 0 These two equations can be written nβ 0 + β 1 x i1 = β 0 or, in matrix form x i1 + β 1 x 2 i1 = (X X)β = X y y i x i1 y i 7

9 The Normal Equations These equations are the Normal Equations. If the symmetric p p = 2 2 matrix X X is non-singular, then we may write the solution β = (X X) 1 X y which yields a p 1 = 2 1 vector of least squares estimates. Explicitly, we have [ β0 β 1 ] = [ ] 1 [ ] n xi1 yi xi1 x 2 i1 xi1 y i 8

10 Estimates n x i1 x i1 x 2 i1 1 = 1 n n x 2 i1 { n x i1 } 2 x 2 i1 x i1 n n x i1 n We write S xx = S xy = { } 2 x 2 i1 x i1 = { x i1 y i 1 n (x i1 x 1 ) 2 x i1 } y i = y i (x i1 x 1 ) 9

11 Estimates (cont.) Thus, after some algebra β 0 = y β 1 x 1 β 1 = S xy S xx 10

12 Residuals and fitted values Define for i = 1,..., n, e i = y i ( β 0 + β 1 x i1 ) = y i ŷ i e i ith residual ŷ i ith fitted value. 11

13 Statistical properties of least squares estimators It is evident from the formula β = (X X) 1 X y = Ay say, where A = (X X) 1 X that the least squares estimates are merely linear combinations of the observed responses y = (y 1,..., y n ). Specifically in the simple linear regression β 0 = where, for i = 1,..., n, ( ) 1 n x 1c i y i β1 = c i = x i1 x 1 S xx. c i y i 12

14 Statistical properties of the estimators In random variable form, we have the estimators β = (X X) 1 X Y = AY and thus, under the model assumptions E Y X [Y X] = Xβ and Var Y X [Y X] = σ 2 I n we can study distributional properties of the estimators. 13

15 Statistical properties of the estimators (cont.) We have, using elementary properties of expectation and variance, E Y X [ β X] = β (p 1) Var Y X [ β X] = σ 2 (X X) 1 (p p) with p = 2. Explicitly x Var Y X [ β 0 X] = σ 2 2 ( i1 1 = σ 2 ns xx n + (x 1) 2 ) S xx Var Y X [ β 1 X] = σ2 S xx 14

16 Residuals and Fitted values For the simple linear regression 1. e i = 1 n e = x i1 e i = 1 x 3. y i = n ŷ i. 4. ŷ i e i = 0, that is, the observed residual vector e is orthogonal to the observed n 1 vectors x = (x 11,..., x n1 ) 1 and ŷ = (ŷ 1,..., ŷ n ). 15

17 Residuals and Fitted values (cont.) These results arise as β solves X (y X β) = 0 p and the results above follow immediately. 16

18 Estimating σ 2 Let SS Res = ei 2 = (y i ŷ i ) 2 = y 2 i n(y)2 β 1 S xy = SS T β 1 S xy say, where SS T = y 2 i n(y)2 = (y i y) 2 We study the statistical properties of the random variable (Y i Ŷi) 2 = (Y i x i β) 2 = (Y X β) (Y X β) 17

19 Estimating σ 2 (cont.) where β = (X X) 1 X Y is the vector of least squares estimators. 18

20 Estimating σ 2 (cont.) But Ŷ = X β = X(X X) 1 X Y = HY say, where H = X(X X) 1 X is the hat matrix. We can show that H is symmetric, and that H H = H so H is idempotent. 19

21 Estimating σ 2 (cont.) Now consider the simpler model where dependence on x i1 is omitted, and we merely have an intercept term. Predictions in this model use the (n 1) matrix X = (1, 1,..., 1) = 1 n yielding the corresponding hat matrix H 1 = 1 n (1 n 1 n ) 1 1 n = 1 n 1 n1 n which is merely the (n n) matrix with all elements equal to 1/n. 20

22 Estimating σ 2 (cont.) We have that SS Res = (Y X β) (Y X β) = Y (I n H)Y where the (n n) matrix (I n H) is symmetric and idempotent. Now, we have the sum of squares decomposition or (y i y) 2 = (y i ŷ i ) 2 + (ŷ i y) 2 SS T = SS Res + SS R 21

23 Estimating σ 2 (cont.) Similarly to the previous result we have SS T = Y (I n H 1 )Y and SS R = Y (H H 1 )Y yielding the representation Y (I n H 1 )Y = Y (I n H)Y + Y (H H 1 )Y where the (n n) matrices (I n H 1 ) and (H H 1 ) are also symmetric and idempotent. 22

24 Estimating σ 2 (cont.) Using the result for the expectation of a quadratic form that if V is a k-dimensional random vector with E[V] = µ Var[V] = Σ then for k k matrix A, we have it follows that E[V AV] = trace(aσ) + µ Aµ E Y X [Y (I n H)Y] = (n p)σ 2 Hence an unbiased estimator of σ 2 is σ 2 = SS Res n p = MS Res with p = 2. 23

25 Estimating σ 2 (cont.) Using similar methods and E Y X [SS R X] = E Y X [Y (H H 1 )Y X] = (p 1)σ 2 + β 2 1S xx E Y X [SS T X] = E Y X [Y (I n H 1 )Y X] = (n 1)σ 2 + β 2 1S xx 24

26 Standard errors for the estimators We have that which is estimated by Var Y X [ β X] = σ 2 (X X) 1 σ 2 (X X) 1. The standard errors of the estimators are estimated by the square roots of the diagonal elements of this matrix; denote them by e.s.e( β j ) j = 0, 1. 25

27 Hypothesis Testing We can formulate hypothesis tests for the parameters provided we make the normality assumption For j = 0, 1, to test we use the test statistic ɛ X Normal(0 n, σ 2 I n ). H 0 : β j = 0 vs H 1 : β j 0 t j = β j e.s.e( β j ). 26

28 Hypothesis Testing (cont.) If H 0 is true, we have by standard distributional results that corresponding statistic T j Student(n p) with p = 2. We reject H 0 at significance level α if t j > t α/2,n p where t α,ν is the 1 α quantile of the Student-t distribution with ν degrees of freedom. 27

29 Confidence Intervals A (1 α) 100% confidence interval for β j is β j ± t α/2,n p e.s.e( β j ) j = 0, 1. 28

30 Global Model Adequacy The R 2 statistic is defined by R 2 = SS R SS T = 1 SS Res SS T and is a measure of the global adequacy of x as a predictor of y. The adjusted R 2 statistic is defined by R 2 Adj = 1 SS Res/(n p) SS T /(n 1) and is a measure that acknowledges that SS Res decreases in expectation as p increases. 29

31 Local Model Adequacy Residual plots are used to assess local model adequacy. If the model assumptions are correct, then the residual plots e i vs i e i vs x i1 e i vs ŷ i should be patternless that is, should not exhibit systematic patterns in either mean-level or variability. The residuals should form a horizontal band around zero, with equal variability around zero everywhere. 30

32 Prediction Predictions from the model at value of x are formed by using the estimated regression coefficients; at x = x i1 observed in the sample, we have the prediction equal to the fitted value In vector form, we have ŷ i = β 0 + β 1 x i1. ŷ = X β. At x = x new 1, we have the prediction ŷ new = β 0 + β 1 x new 1. 31

33 Confidence and Prediction Intervals In the random variable form we have predictions so that and Ŷ = X β = HY E Y X [Ŷ X] = Xβ Var Y X [Ŷ X] = σ2 X(X X) 1 X = σ 2 H Therefore a (1 α) 100% confidence interval for the prediction at x = x i1 is ŷ i ± t α/2,n p σ 2 h ii where h ii is the (i, i)th diagonal element of H. 32

34 Confidence and Prediction Intervals (cont.) For a prediction at x = x new 1, we have that Var Y X [Ŷ new X] = σ 2 x new (X X) 1 (x new ) = σ 2 h new and a (1 α) 100% confidence interval for the prediction at x = x new 1 is ŷ new ± t α/2,n p σ 2 h new 33

35 Confidence and Prediction Intervals (cont.) A prediction interval at x = x new 1 incorporates the random variation that is present in the observations. Let Ŷ new O = Ŷ new + ɛ new where ɛ new is a zero mean, variance σ 2 random residual error, independent of all other random quantities. Then Var Y X [Ŷ new O X] = Var Y X[Ŷ new X] + Var Y X [ɛ new X] = σ 2 h new + σ 2 = σ 2 (1 + h new ). Thus a (1 α) 100% prediction interval for the prediction at x = x new 1 is ŷ new ± t α/2,n p σ 2 (1 + h new ) 34

36 The Analysis of Variance The sums-of-squares decomposition SS T = SS Res + SS R forms the basic component of the Analysis of Variance (ANOVA) as it describes how overall observed variability in response y (SS T ) is decomposed into a component corresponding to the residual errors (SS Res ) and a component corresponding to the regression (SS R ). 35

37 The Analysis of Variance (cont.) Under the assumption of Normality of residual errors, ɛ X Normal(0 n, σ 2 I n ), and the hypothesis that β 1 = 0, we have the result that for the sums-of-squares random variables SS T σ 2 = Y (I n H 1 )Y σ 2 χ 2 n 1 SS Res σ 2 = Y (I n H)Y σ 2 χ 2 n p with SS Res and SS R independent. SS R σ 2 = Y (H H 1 )Y σ 2 χ 2 p 1 36

38 The Analysis of Variance (cont.) Consequently we can show that under the hypothesis, the random variable F = SS R/(p 1) SS Res /(n p) has a Fisher-F distribution with p 1 and n p degrees of freedom F Fisher(p 1, n p). We can construct a test of H 0 : β 1 = 0 based on this result: we reject H 0 at significance level α if F > F α,p 1,n p where F α,ν1,ν 2 is the (1 α) quantile of the Fisher-F distribution with ν 1 and ν 2 degrees of freedom 37

39 The Analysis of Variance (cont.) This test is equivalent to the test of H 0 : β 1 = 0 based on the t-statistic; we have that t 2 1 = { β1 e.s.e( β 1 ) } 2 = F and the two-tailed test based on t 1 is equivalent to the one-tailed test based on F. 38

40 The Analysis of Variance (cont.) The ANOVA table arranges the requires information in tabular form: Source SS df MS F Regression SS R p 1 SS R /(p 1) F Residual SS Res n p SS Res /(n p) Total SS T n 1 39

41 Maximum Likelihood Estimation Under the assumption of Normality of residual errors, so that ɛ X Normal(0 n, σ 2 I n ), Y X Normal(Xβ, σ 2 I n ), we may consider using a maximum likelihood (ML) procedure to estimate the model parameters (β, σ 2 ). The likelihood function for data y is L(β, σ 2 ) = ( ) 1 n/2 { 2πσ 2 exp 1 } 2σ 2 (y Xβ) (y Xβ) 40

42 Maximum Likelihood Estimation (cont.) The log-likelihood is l(β, σ 2 ) = n 2 log σ2 1 2σ 2 (y Xβ) (y Xβ) + constant = n 2 log σ2 1 S(β) + constant. 2σ2 We seek to maximize this function with respect to β and σ 2. It is evident that, in terms of β, the maximum value of l(β, σ 2 ) is attained (for any σ 2 ) when S(β) is minimized. Thus the maximum likelihood estimate of β is identical to the least squares estimate. 41

43 Maximum Likelihood Estimation (cont.) To maximize over σ 2, note that l(β, σ 2 ) σ 2 = n 2σ σ 4 S(β). Equating to zero, and setting β = β, we have the solution σ 2 ML = 1 n S( β) = 1 n (y X β) (y X β) = 1 n (y i ŷ i ) 2 = SS Res n. 42

44 Multiple Linear Regression The simple linear regression model extends to include multiple predictors E Y X [Y X] = Xβ in a straightforward way. Least squares estimation, inference, testing prediction etc. proceeds in precisely the same way. Model checking using residuals and R 2 also proceeds as for simple linear regression. F-testing procedures for assessing the utility of including all the predictors proceed as before, and there is a natural extension to the use of ANOVA tables in regression. 43

45 Hypothesis Testing However, now we may also consider individual t-tests for the coefficients, from a single fit. Furthermore we may consider simultaneous tests for collections of parameters using the F-test. We split X = [X (1) X (2) ] and β = [β (1) β (2) ] and consider the model Y = Xβ + ɛ = X (1) β (1) + X (2) β (2) + ɛ. We compare a simpler model with a more complex model, where the simpler model is obtained by hypothesizing that some βs are zero, that is Full Model : Y = Xβ + ɛ Reduced Model : Y = X (1) β (1) + ɛ that is, under the reduced model we assume β (2) = 0 r. 44

46 Hypothesis Testing (cont.) In general the parameter estimates arising from the two models will be different; that is β (1) estimates from the Full model will in general not be equal to β (1) estimates from the Reduced model, as in the Full model [ ] [ β β(1) {X = β (2) = (1) } X (1) {X (1) } X (2) ] 1 [ {X (1) } ] y {X (2) } X (1) {X (2) } X (2) {X (2) } y and in the reduced model β (1) = ({X (1) } X (1) ) 1 {X (1) } y. 45

47 Hypothesis Testing (cont.) We modify the earlier sums of squares decomposition (y i y) 2 = (y i ŷ i ) 2 + (ŷ i y) 2 SS T = SS Res + SS R to y 2 i = (y i ŷ i ) 2 + ŷ 2 i SS T = SS Res + SS R 46

48 Hypothesis Testing (cont.) Note that SS T = SS T n{y} 2 and SS R = SS R n{y} 2. 47

49 Hypothesis Testing (cont.) If SS R (β) and SS R (β (1) ) denote the regression sums of squares from the Full and Reduced models respectively, the extra sum of squares due to β (2) in the presence of β (1) is SS R (β (2) β (1) ) = SS R (β) SS R (β (1) ). This facilitates the F-test of the null hypothesis H 0 : β (2) = 0 r that is, that the Reduced model is an adequate simplification of the Full model. 48

50 Hypothesis Testing (cont.) We perform a partial F-test using F = (SS R(β (2) β (1) ))/r MS Res (β) = (SS Res(β (1) ) SS Res (β))/r SS Res (β)/(n p) which is distributed as Fisher(r, n p) if H 0 is true. 49

51 Hypothesis Testing (cont.) The test may be used sequentially: for example, if parameter β = (β 0, β 1, β 2, β 3 ), then we have SS R (β 0, β 1, β 2, β 3 ) = SS R (β 0 ) + SS R (β 1 β 0 ) + SS R (β 2 β 0, β 1 ) + SS R (β 3 β 0, β 1, β 2 ) and also SS R (β 1, β 2, β 3 β 0 ) = SS R (β 1 β 0 ) + SS R (β 2 β 0, β 1 ) + SS R (β 3 β 0, β 1, β 2 ) We also have SS R (β 1, β 2, β 3 β 0 ) SS R (β 1, β 2, β 3 ). 50

52 Hypothesis Testing (cont.) This latter decomposition allows us to test whether X 1 is worth including in the model, then whether X 2 is worth including, when X 1 is already included, then whether X 3 is worth including, when X 1 and X 2 are already included. 51

53 Hypothesis Testing (cont.) Note: If the X (1) and X (2) blocks are orthogonal {X (1) } X (2) = 0 p r,r then β estimates from Full and Reduced model are equal; we have the identity SS R (β (1) β (2) ) = SS R (β (1) ). 52

54 Multicollinearity Multicollinearity corresponds to dependence/correlation amongst the predictors: can lead to inflation of the variance of estimators if present; can be measured by variance inflation factors; equivalent to R 2 measures constructed for the predictors; can be computed from the correlation matrix from the predictors. 53

55 Special Types of Predictors polynomial terms; factor predictors: discrete predictors measured on a nominal ( non-numerical) scale; interactions: an interaction is a modification of the effect of one predictor in the presence of another. higher-order interactions. 54

56 Special Types of Predictors (cont.) Important issues include how to represent factor predictors using indicator functions; how to count parameters; the interpretation of interactions; removing the intercept. 55

57 Model Selection Strategies Our goal is to find the simplest possible model that adequately explains the observed response data. Forward selection; Backward elimination; Stepwise elimination; 56

58 Model Selection Criteria R 2 -based methods; largest R 2 Adj equivalent to minimum MS Res; Mallows s C p ; AIC; BIC; 57

59 Why the best model matters Over-simplified model: if the chosen model omits important predictors, then the resulting estimates will be biased, have lower variance, and can have higher mean-squared error, depending on the magnitude of the effect of the omitted predictors; the estimate of σ 2 will be too high; predictions will have higher mean squared error. If the omitted predictors are orthogonal to the included ones, then the bias is removed. 58

60 Why the best model matters (cont.) Over-complex model: if the chosen model includes unnecessary predictors, then the resulting estimates will be unbiased, have higher variance, and higher mean-squared error; the estimate of σ 2 will be unbiased; predictions will have no bias, higher variance and higher mean squared error. 59

61 Assessing the model fit PRESS residuals/statistic; leave-one-out/deletion residuals; leverage; influence in estimation and prediction; outliers; influence diagnostics. 60

62 Generalizing Least Squares We may relax the variance assumption Var Y X [Y X] = σ 2 I n to Var Y X [Y X] = σ 2 V where V is a square, symmetric, non-singular matrix. This allows for unequal variances in the conditional response distribution; correlation amongst the residual errors ɛ. This leads to the amended least squares objective function (y Xβ) V 1 (y Xβ) 61

63 Generalizing Least Squares (cont.) This leads to the estimates β = (X V 1 X) X V 1 y and the properties of the estimators, predictions etc. follow in a straightforward fashion. A special case is weighted least squares, where V = diag(1/w 1, 1/w 2,..., 1/w n ) which accommodates unequal variances. 62

64 Transforming the response We may use transformations of the response y i to a different form to make the linear regression model assumptions more appropriate: log; square root; Box-Cox transform. 63

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Final Review. Yang Feng.   Yang Feng (Columbia University) Final Review 1 / 58 Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

Formal Statement of Simple Linear Regression Model

Formal Statement of Simple Linear Regression Model Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor

More information

18.S096 Problem Set 3 Fall 2013 Regression Analysis Due Date: 10/8/2013

18.S096 Problem Set 3 Fall 2013 Regression Analysis Due Date: 10/8/2013 18.S096 Problem Set 3 Fall 013 Regression Analysis Due Date: 10/8/013 he Projection( Hat ) Matrix and Case Influence/Leverage Recall the setup for a linear regression model y = Xβ + ɛ where y and ɛ are

More information

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model Outline 1 Multiple Linear Regression (Estimation, Inference, Diagnostics and Remedial Measures) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression

More information

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing STAT763: Applied Regression Analysis Multiple linear regression 4.4 Hypothesis testing Chunsheng Ma E-mail: cma@math.wichita.edu 4.4.1 Significance of regression Null hypothesis (Test whether all β j =

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

Simple and Multiple Linear Regression

Simple and Multiple Linear Regression Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).

More information

Chapter 14. Linear least squares

Chapter 14. Linear least squares Serik Sagitov, Chalmers and GU, March 5, 2018 Chapter 14 Linear least squares 1 Simple linear regression model A linear model for the random response Y = Y (x) to an independent variable X = x For a given

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation. Statistical Computation Math 475 Jimin Ding Department of Mathematics Washington University in St. Louis www.math.wustl.edu/ jmding/math475/index.html October 10, 2013 Ridge Part IV October 10, 2013 1

More information

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77 Linear Regression Chapter 3 September 27, 2016 Chapter 3 September 27, 2016 1 / 77 1 3.1. Simple linear regression 2 3.2 Multiple linear regression 3 3.3. The least squares estimation 4 3.4. The statistical

More information

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010 1 Linear models Y = Xβ + ɛ with ɛ N (0, σ 2 e) or Y N (Xβ, σ 2 e) where the model matrix X contains the information on predictors and β includes all coefficients (intercept, slope(s) etc.). 1. Number of

More information

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University. Summer School in Statistics for Astronomers V June 1 - June 6, 2009 Regression Mosuk Chow Statistics Department Penn State University. Adapted from notes prepared by RL Karandikar Mean and variance Recall

More information

2. Regression Review

2. Regression Review 2. Regression Review 2.1 The Regression Model The general form of the regression model y t = f(x t, β) + ε t where x t = (x t1,, x tp ), β = (β 1,..., β m ). ε t is a random variable, Eε t = 0, Var(ε t

More information

Introduction to Statistical modeling: handout for Math 489/583

Introduction to Statistical modeling: handout for Math 489/583 Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect

More information

ANALYSIS OF VARIANCE AND QUADRATIC FORMS

ANALYSIS OF VARIANCE AND QUADRATIC FORMS 4 ANALYSIS OF VARIANCE AND QUADRATIC FORMS The previous chapter developed the regression results involving linear functions of the dependent variable, β, Ŷ, and e. All were shown to be normally distributed

More information

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 Time allowed: 3 HOURS. STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 This is an open book exam: all course notes and the text are allowed, and you are expected to use your own calculator.

More information

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Linear Models 1. Isfahan University of Technology Fall Semester, 2014 Linear Models 1 Isfahan University of Technology Fall Semester, 2014 References: [1] G. A. F., Seber and A. J. Lee (2003). Linear Regression Analysis (2nd ed.). Hoboken, NJ: Wiley. [2] A. C. Rencher and

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maximum Likelihood Estimation Merlise Clyde STA721 Linear Models Duke University August 31, 2017 Outline Topics Likelihood Function Projections Maximum Likelihood Estimates Readings: Christensen Chapter

More information

Need for Several Predictor Variables

Need for Several Predictor Variables Multiple regression One of the most widely used tools in statistical analysis Matrix expressions for multiple regression are the same as for simple linear regression Need for Several Predictor Variables

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Chapter 3: Multiple Regression. August 14, 2018

Chapter 3: Multiple Regression. August 14, 2018 Chapter 3: Multiple Regression August 14, 2018 1 The multiple linear regression model The model y = β 0 +β 1 x 1 + +β k x k +ǫ (1) is called a multiple linear regression model with k regressors. The parametersβ

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

10. Alternative case influence statistics

10. Alternative case influence statistics 10. Alternative case influence statistics a. Alternative to D i : dffits i (and others) b. Alternative to studres i : externally-studentized residual c. Suggestion: use whatever is convenient with the

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear

More information

STAT 540: Data Analysis and Regression

STAT 540: Data Analysis and Regression STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State

More information

Chapter 1. Linear Regression with One Predictor Variable

Chapter 1. Linear Regression with One Predictor Variable Chapter 1. Linear Regression with One Predictor Variable 1.1 Statistical Relation Between Two Variables To motivate statistical relationships, let us consider a mathematical relation between two mathematical

More information

Review of Econometrics

Review of Econometrics Review of Econometrics Zheng Tian June 5th, 2017 1 The Essence of the OLS Estimation Multiple regression model involves the models as follows Y i = β 0 + β 1 X 1i + β 2 X 2i + + β k X ki + u i, i = 1,...,

More information

STAT5044: Regression and Anova. Inyoung Kim

STAT5044: Regression and Anova. Inyoung Kim STAT5044: Regression and Anova Inyoung Kim 2 / 51 Outline 1 Matrix Expression 2 Linear and quadratic forms 3 Properties of quadratic form 4 Properties of estimates 5 Distributional properties 3 / 51 Matrix

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix

More information

Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014

Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014 Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014 Tim Hanson, Ph.D. University of South Carolina T. Hanson (USC) Stat 704: Data Analysis I, Fall 2014 1 / 13 Chapter 8: Polynomials & Interactions

More information

R 2 and F -Tests and ANOVA

R 2 and F -Tests and ANOVA R 2 and F -Tests and ANOVA December 6, 2018 1 Partition of Sums of Squares The distance from any point y i in a collection of data, to the mean of the data ȳ, is the deviation, written as y i ȳ. Definition.

More information

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata' Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where

More information

Chapter 5 Matrix Approach to Simple Linear Regression

Chapter 5 Matrix Approach to Simple Linear Regression STAT 525 SPRING 2018 Chapter 5 Matrix Approach to Simple Linear Regression Professor Min Zhang Matrix Collection of elements arranged in rows and columns Elements will be numbers or symbols For example:

More information

ECON 450 Development Economics

ECON 450 Development Economics ECON 450 Development Economics Statistics Background University of Illinois at Urbana-Champaign Summer 2017 Outline 1 Introduction 2 3 4 5 Introduction Regression analysis is one of the most important

More information

Regression Steven F. Arnold Professor of Statistics Penn State University

Regression Steven F. Arnold Professor of Statistics Penn State University Regression Steven F. Arnold Professor of Statistics Penn State University Regression is the most commonly used statistical technique. It is primarily concerned with fitting models to data. It is often

More information

Math 5305 Notes. Diagnostics and Remedial Measures. Jesse Crawford. Department of Mathematics Tarleton State University

Math 5305 Notes. Diagnostics and Remedial Measures. Jesse Crawford. Department of Mathematics Tarleton State University Math 5305 Notes Diagnostics and Remedial Measures Jesse Crawford Department of Mathematics Tarleton State University (Tarleton State University) Diagnostics and Remedial Measures 1 / 44 Model Assumptions

More information

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46 BIO5312 Biostatistics Lecture 10:Regression and Correlation Methods Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/1/2016 1/46 Outline In this lecture, we will discuss topics

More information

Regression Analysis. Simple Regression Multivariate Regression Stepwise Regression Replication and Prediction Error EE290H F05

Regression Analysis. Simple Regression Multivariate Regression Stepwise Regression Replication and Prediction Error EE290H F05 Regression Analysis Simple Regression Multivariate Regression Stepwise Regression Replication and Prediction Error 1 Regression Analysis In general, we "fit" a model by minimizing a metric that represents

More information

Lecture 1: Linear Models and Applications

Lecture 1: Linear Models and Applications Lecture 1: Linear Models and Applications Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction to linear models Exploratory data analysis (EDA) Estimation

More information

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Lecture 2. The Simple Linear Regression Model: Matrix Approach Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution

More information

This exam contains 5 questions. Each question is worth 10 points. Therefore, this exam is worth 50 points.

This exam contains 5 questions. Each question is worth 10 points. Therefore, this exam is worth 50 points. GROUND RULES: This exam contains 5 questions. Each question is worth 10 points. Therefore, this exam is worth 50 points. Print your name at the top of this page in the upper right hand corner. This is

More information

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises LINEAR REGRESSION ANALYSIS MODULE XVI Lecture - 44 Exercises Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Exercise 1 The following data has been obtained on

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

11 Hypothesis Testing

11 Hypothesis Testing 28 11 Hypothesis Testing 111 Introduction Suppose we want to test the hypothesis: H : A q p β p 1 q 1 In terms of the rows of A this can be written as a 1 a q β, ie a i β for each row of A (here a i denotes

More information

1 Least Squares Estimation - multiple regression.

1 Least Squares Estimation - multiple regression. Introduction to multiple regression. Fall 2010 1 Least Squares Estimation - multiple regression. Let y = {y 1,, y n } be a n 1 vector of dependent variable observations. Let β = {β 0, β 1 } be the 2 1

More information

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is Q = (Y i β 0 β 1 X i1 β 2 X i2 β p 1 X i.p 1 ) 2, which in matrix notation is Q = (Y Xβ) (Y

More information

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 Lecture 3: Linear Models Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector of observed

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 370 Regression models are used to study the relationship of a response variable and one or more predictors. The response is also called the dependent variable, and the predictors

More information

Direction: This test is worth 250 points and each problem worth points. DO ANY SIX

Direction: This test is worth 250 points and each problem worth points. DO ANY SIX Term Test 3 December 5, 2003 Name Math 52 Student Number Direction: This test is worth 250 points and each problem worth 4 points DO ANY SIX PROBLEMS You are required to complete this test within 50 minutes

More information

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects Contents 1 Review of Residuals 2 Detecting Outliers 3 Influential Observations 4 Multicollinearity and its Effects W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 32 Model Diagnostics:

More information

Chapter 12: Multiple Linear Regression

Chapter 12: Multiple Linear Regression Chapter 12: Multiple Linear Regression Seungchul Baek Department of Statistics, University of South Carolina STAT 509: Statistics for Engineers 1 / 55 Introduction A regression model can be expressed as

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Multiple Linear Regression

Multiple Linear Regression Andrew Lonardelli December 20, 2013 Multiple Linear Regression 1 Table Of Contents Introduction: p.3 Multiple Linear Regression Model: p.3 Least Squares Estimation of the Parameters: p.4-5 The matrix approach

More information

Linear Regression In God we trust, all others bring data. William Edwards Deming

Linear Regression In God we trust, all others bring data. William Edwards Deming Linear Regression ddebarr@uw.edu 2017-01-19 In God we trust, all others bring data. William Edwards Deming Course Outline 1. Introduction to Statistical Learning 2. Linear Regression 3. Classification

More information

STATISTICS 479 Exam II (100 points)

STATISTICS 479 Exam II (100 points) Name STATISTICS 79 Exam II (1 points) 1. A SAS data set was created using the following input statement: Answer parts(a) to (e) below. input State $ City $ Pop199 Income Housing Electric; (a) () Give the

More information

6. Multiple Linear Regression

6. Multiple Linear Regression 6. Multiple Linear Regression SLR: 1 predictor X, MLR: more than 1 predictor Example data set: Y i = #points scored by UF football team in game i X i1 = #games won by opponent in their last 10 games X

More information

SMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning

SMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning SMA 6304 / MIT 2.853 / MIT 2.854 Manufacturing Systems Lecture 10: Data and Regression Analysis Lecturer: Prof. Duane S. Boning 1 Agenda 1. Comparison of Treatments (One Variable) Analysis of Variance

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Regression Model Specification in R/Splus and Model Diagnostics. Daniel B. Carr

Regression Model Specification in R/Splus and Model Diagnostics. Daniel B. Carr Regression Model Specification in R/Splus and Model Diagnostics By Daniel B. Carr Note 1: See 10 for a summary of diagnostics 2: Books have been written on model diagnostics. These discuss diagnostics

More information

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives

More information

Lecture 10 Multiple Linear Regression

Lecture 10 Multiple Linear Regression Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable

More information

Applied linear statistical models: An overview

Applied linear statistical models: An overview Applied linear statistical models: An overview Gunnar Stefansson 1 Dept. of Mathematics Univ. Iceland August 27, 2010 Outline Some basics Course: Applied linear statistical models This lecture: A description

More information

STATISTICS 174: APPLIED STATISTICS TAKE-HOME FINAL EXAM POSTED ON WEBPAGE: 6:00 pm, DECEMBER 6, 2004 HAND IN BY: 6:00 pm, DECEMBER 7, 2004 This is a

STATISTICS 174: APPLIED STATISTICS TAKE-HOME FINAL EXAM POSTED ON WEBPAGE: 6:00 pm, DECEMBER 6, 2004 HAND IN BY: 6:00 pm, DECEMBER 7, 2004 This is a STATISTICS 174: APPLIED STATISTICS TAKE-HOME FINAL EXAM POSTED ON WEBPAGE: 6:00 pm, DECEMBER 6, 2004 HAND IN BY: 6:00 pm, DECEMBER 7, 2004 This is a take-home exam. You are expected to work on it by yourself

More information

Variable Selection and Model Building

Variable Selection and Model Building LINEAR REGRESSION ANALYSIS MODULE XIII Lecture - 39 Variable Selection and Model Building Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur 5. Akaike s information

More information

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

y ˆ i = ˆ  T u i ( i th fitted value or i th fit) 1 2 INFERENCE FOR MULTIPLE LINEAR REGRESSION Recall Terminology: p predictors x 1, x 2,, x p Some might be indicator variables for categorical variables) k-1 non-constant terms u 1, u 2,, u k-1 Each u

More information

Probability and Statistics Notes

Probability and Statistics Notes Probability and Statistics Notes Chapter Seven Jesse Crawford Department of Mathematics Tarleton State University Spring 2011 (Tarleton State University) Chapter Seven Notes Spring 2011 1 / 42 Outline

More information

13 Simple Linear Regression

13 Simple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity

More information

Diagnostics and Remedial Measures

Diagnostics and Remedial Measures Diagnostics and Remedial Measures Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Diagnostics and Remedial Measures 1 / 72 Remedial Measures How do we know that the regression

More information

Linear Regression Model. Badr Missaoui

Linear Regression Model. Badr Missaoui Linear Regression Model Badr Missaoui Introduction What is this course about? It is a course on applied statistics. It comprises 2 hours lectures each week and 1 hour lab sessions/tutorials. We will focus

More information

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression BSTT523: Kutner et al., Chapter 1 1 Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression Introduction: Functional relation between

More information

Central Limit Theorem ( 5.3)

Central Limit Theorem ( 5.3) Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately

More information

Matrix Approach to Simple Linear Regression: An Overview

Matrix Approach to Simple Linear Regression: An Overview Matrix Approach to Simple Linear Regression: An Overview Aspects of matrices that you should know: Definition of a matrix Addition/subtraction/multiplication of matrices Symmetric/diagonal/identity matrix

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

Introduction to Statistical Data Analysis Lecture 8: Correlation and Simple Regression

Introduction to Statistical Data Analysis Lecture 8: Correlation and Simple Regression Introduction to Statistical Data Analysis Lecture 8: and James V. Lambers Department of Mathematics The University of Southern Mississippi James V. Lambers Statistical Data Analysis 1 / 40 Introduction

More information

Topic 22 Analysis of Variance

Topic 22 Analysis of Variance Topic 22 Analysis of Variance Comparing Multiple Populations 1 / 14 Outline Overview One Way Analysis of Variance Sample Means Sums of Squares The F Statistic Confidence Intervals 2 / 14 Overview Two-sample

More information

STATISTICS 110/201 PRACTICE FINAL EXAM

STATISTICS 110/201 PRACTICE FINAL EXAM STATISTICS 110/201 PRACTICE FINAL EXAM Questions 1 to 5: There is a downloadable Stata package that produces sequential sums of squares for regression. In other words, the SS is built up as each variable

More information

Math 533 Extra Hour Material

Math 533 Extra Hour Material Math 533 Extra Hour Material A Justification for Regression The Justification for Regression It is well-known that if we want to predict a random quantity Y using some quantity m according to a mean-squared

More information

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j Standard Errors & Confidence Intervals β β asy N(0, I( β) 1 ), where I( β) = [ 2 l(β, φ; y) ] β i β β= β j We can obtain asymptotic 100(1 α)% confidence intervals for β j using: β j ± Z 1 α/2 se( β j )

More information

Linear model selection and regularization

Linear model selection and regularization Linear model selection and regularization Problems with linear regression with least square 1. Prediction Accuracy: linear regression has low bias but suffer from high variance, especially when n p. It

More information

Tentative solutions TMA4255 Applied Statistics 16 May, 2015

Tentative solutions TMA4255 Applied Statistics 16 May, 2015 Norwegian University of Science and Technology Department of Mathematical Sciences Page of 9 Tentative solutions TMA455 Applied Statistics 6 May, 05 Problem Manufacturer of fertilizers a) Are these independent

More information

Correlation and Regression

Correlation and Regression Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class

More information

Correlation and regression

Correlation and regression 1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

More information

MATH Notebook 4 Spring 2018

MATH Notebook 4 Spring 2018 MATH448001 Notebook 4 Spring 2018 prepared by Professor Jenny Baglivo c Copyright 2010 2018 by Jenny A. Baglivo. All Rights Reserved. 4 MATH448001 Notebook 4 3 4.1 Simple Linear Model.................................

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

Multivariate Linear Regression Models

Multivariate Linear Regression Models Multivariate Linear Regression Models Regression analysis is used to predict the value of one or more responses from a set of predictors. It can also be used to estimate the linear association between

More information

REGRESSION ANALYSIS AND INDICATOR VARIABLES

REGRESSION ANALYSIS AND INDICATOR VARIABLES REGRESSION ANALYSIS AND INDICATOR VARIABLES Thesis Submitted in partial fulfillment of the requirements for the award of degree of Masters of Science in Mathematics and Computing Submitted by Sweety Arora

More information